Question 1. Profile HMMs for sequence families
a) Define matching (M), insert (I) and delete (D) states of the multiple sequence alignment (MSA) shown in Figure 1
b) Derive parameters of profile HMM for MSA given in figure 1
I. Emission counts for match states
II. Emission counts for insert states
III. Counts of transitions between states
IV. Emission probabilities for match, insert, and hidden states
Figure 1. Multiple sequence alignment of five DNA sequences
T--CT-
-AA-TA
T--CTA
TC-G-A
C-CGAC
Feel free to use Durbin's Figure 5.7c format
2. Provide 1-1.5 page review for the paper "Genome-wide genetic marker discovery and genotyping using next-generation sequencing" available under this week's course content
Some guidelines:
- Underline main points of the paper.
- Keep your work structured.
- While focusing on big picture keep in mind our class is on statistical processes.
3. Use file available in course content for this week tor write and submit R-script which will:
a. Define HMM model for Q4 in Homework 3
b. Parse the Homework 3 Q4 sequence to show sequence of hidden states using Viterbi algorithm:;
Homework 3 Solution Question 4: (a) Define zero order Markov model for sequence2_A2, which represents portion of non-coding sequence of Mycobacterium tuberculosis (refer to course content)
zero order for sequence2_A2:
P(A) 107 0.195255474
P(C) 156 0.284671533
P(G) 183 0.333941606
P(T) 102 0.186131387
b) Use zero order Markov models defined for sequence1_A2 and sequence2_A2 and apply Viterbi algorithm to find the most likely path for sequence CGCGTTACTTCAATG without taking frame into consideration
Assume:
Initial transition probabilities
a0c= a0n =0.5
State transition probabilities
acc 0.55
acn 0.45
ann 0.5
anc 0.5
where, aij is transition probability, c- coding, n-non-coding
sequence CGCGTTACTTCAATG
path of hidden states CCCCNNCCNNCCCCC
Attachment:- post.xlsx