Construct a dot-plot for the pair of sequences using the


PART A: Term Papers # A.1 & A.2

NEEDLEMAN and WUNSCH (NW)& SMITH-WATERMAN (SW) ALGORITHMS

Each student is assigned a sequence pair (1 & 2) as in Table TP-NW/SW Exercises. Develop a computer algorithm and build a code (MatLab or C/C++) to perform NW/SW algorithm based computationsas indicated for each student based on the comparison between the two given sequences.Elucidate the optimal path-way (that is, optimal global/local alignment).

If you are not familiar with any programming, you may do hand calculation and give your step-by-step results. Due credit will be given.

Table TP-NW/SW Exercises

Pairs

Sequence pairs for SW exercise

NW/SW

(A.1 & A.2

1

2

F F T E E Q S D I E D N C Q T

D F T Q E T E D I E D N C Q Q

NW

1

2

F R F Q N T I L D G G A E E G

F Q F Q N T I S Y Y G G E L D

SW

General Format of TPs:

You are required to supplement your answers (Term-papers) with appropriate and relevant (state-of-the-art) details plus the particulars as needed. Each of your term-paper should include the following:

• One page Executive Summary

• An elaborate description of the topic assigned with relevant references. You may supplement your answers and augment your concepts with appropriate cross-references as necessary. All such references should be clearly identified and listed in a standard format such as IEEE journal publication format. Any Web page reference can be shown by its title and web-site address. You are encouraged to append the hard copies of such references with your solutions

Term Paper # B.1 :Descriptive Study Projects

EXERCISE B.1.D

Use the following link to find the sequence of yeast clone #71020.

https://genome-www4.stanford.edu/cgi-bin/SGD/getSeq?map=a3map&seq=71020&flankl=&flankr=&rev=

Use Genscan to find the ORFs in this sequence using "Vertebrate" as your organism.

How many complete genes are there?
How many of the complete genes have introns?
How many amino acids are there in ORF #3?

Copy the predicted protein sequence from ORF #3 and use that sequence to perform an appropriate search to determine the identity of the protein and the gene that encodes it.

What is the name of the gene that encodes this protein?

Based on the gene acronym (and other information that you might have already found), in what molecular process do you suppose this gene is involved?

Locate the DNA sequence of the gene. There are many ways to do this, but all of them should get you to the same answer. As a hint to be sure you're on the right track, the first few bases of the ORF are ATGGCAAAAACG.

PAIRWISE SEQUENCE COMPARISON Dot-plot, Needleman-Wunsch (NW) and Smith-Waterman (SW) Algorithms

Using Dotlet Program

The reason why we wrote dotlet is that we needed a diagonal plot tool for the December 1998 practical sessions in bioinformatics at the Institute of Biochemistry. Since we had decided to base all the practical sessions on the World-Wide Web, we needed a program that would run in a web browser. To our knowledge, there was none, so we wrote it.

Reference: T. Junier and M. Pagni: Dotlet: diagonal plots in a Web browser,BIOINFORMATICSAPPLICATIONS NOTE, Vol. 16 no. 2 2000, Pages 178-179

Dotlet: diagonal plots in a Web browser

Problem B.1 mutational changes

Construct a matrix of the set {A, C, T, G} to illustrate the characteristic of the transition and transversion mutations.

(Hint: You may use a score of 100 % to depict the element of the matrix pertinent to no mutation and use prorated percentages to represent other elements illustrating the characteristic as above. The spontaneous base substitutions ratio of transitions to transvGiersions is approximately 2:1. Therefore each transition should have a probability of 2/3and each transversion 1/3).

 Problem # C.1

For the two binary sequences X and Y, indicated above in Problem C. 13, plot the Kulback-Leibler (KL) measure between the strings. Hence confirm the most common substring locations between them as decided via HD measure in the previous problem.

(Hint: Again, select a window of size 4. For a given sequence in each window, calculate KL measure. Plot window # versus KL = KL1 + KL2 for each string

KL1 = (p(0)loge[(p(0)/q(1)])window#1 + ....

KL2 = (q(1)loge[(q(1)/p(0)])window#1 + ....

p(0): Probability of 0 in that window; q(1): Probability of 1 in that window)

Problem D.1

Construct a dot-plot for the following pair of sequences using the matrix methods described in the example:

x:         G T G A C C G C T A A C C T C

y          G T T G C GA C T G C G G C G T

Problem D.1(A):

Construct a dot-plot for the following pair of sequences using the dotlet or any other compatible program available as an open source

x:         G T G A C C G C T A A C C T CA C G T T A C

y          T T T G C GA C T G C G G C G T C C C T A A G C

-----------------------------------------------------------------------------------------------------------------------

Problem  D. 2

Assigned is a pair of amino acid sequences (S and T). Determine the best global alignment

S: C U U A C G C A

T: A U G A G A A C U U  

Problem D.3

Given a sequence pair,X and Y as indicated below, determine the best global alignment via trace-back using NW algorithm

X:        G A GC A                              Y:        G A T T C A 

Problem D.4

Given a sequence pairs,U and V as indicated below, determine the best global alignment via trace-back using NW algorithm

U:        C T C G T                               V:        C TA A G T 

Problem D.5

Via hand calculations, perform NW-algorithm based comparison between the two given sequences indicated belowand elucidate the maximum path-way:

MA V R K L S L E G

M S T A L P G L G S

Problem D.5(A):

Via hand calculations, perform NW-algorithm based comparison between the two given sequences and elucidate the maximum path-way:

Sequence Pair

W F G Q E T S A I S

SF T Q F S E D A I

Problem D.6

Given a sequence pairs,X and Y as shown below, determine the best local alignment via trace-back using SW algorithm.

X:        W R N D C Q E G S A          Y:         W G Q E G S I E A

Problem D.6(A) :

Given a sequence pairs,U and V as shown below, determine the best local alignment via trace-back using SW algorithm.

U:        AASTHECWCTWH              V:        AASRNPSCWTTWHT

Problem D.6(B) : Via hand calculation, perform SW-algorithm based comparison between the two given sequences and elucidate the common regions of similarity

Sequence Pair

WY G Q E Q S Y I Q

WY T Q E T S D I Q

Problem E.1

Translate the following regular expressions:

(a) [GA]-T-{C, G}(2)-X-[TGC]-G(3)-[TC]

(b) [TCG]-{A, C}(3)-P-x-[ATG]-x-[VIL]-[IVT]-x-[GS]-G-Y-S-[QL]-A

(c) [TAG]-XXAG-V-X(4)-{AEGD}-[AC]-x-V-x(4)-{ED}

(d) Write regular expression to match each string in the C terminus:

V or L, any (two to four times), A, T, any but D or E

Problem E.2

(a) For the following set of multiple sequence alignment, construct the regular expression and expand it in terms of 3-letter code for amino acids:

T

E

C

V

L

A

R

T

I

N

G

P

V

L

A

R

T

I

N

G

P

T

I

T

R

T

I

N

G

A

V

M

M

R

T

I

A

E

C

V

I

C

R

T

I

K

E

C

V

I

C

R

T

I

A

E

C

T

I

C

R

T

S

N

P

C

V

I

A

R

T

T

K

E

E

V

M

M

R

T

I

(b) For the following set of multiple sequence alignment, construct the regular expressionand expand it in terms of the relevant nucleotide bases

T

C

C

T

G

A

C

A

G

T

G

C

G

G

A

T

A

G

C

C

G

T

C

T

C

T

C

A

G

C

G

G

A

C

T

G

G

T

G

T

G

A

T

G

A

A

C

C

T

G

A

C

T

G

C

G

C

T

A

A

C

T

G

A

G

C

G

G

A

C

T

G

A

C

C

G

G

G

T

T

G

Problem F.1

Using the UPGMA concept, construct an evolutionary tree for the data on pairwise species differences indicated in the following table:

OUT

A

B

C

D

E

A

0

5

30

45

35

B

 

0

28

42

32

C

 

 

0

10

15

D

 

 

 

0

20

E

 

 

 

 

0

Problem F.2

Using the UPGMA concept, construct an evolutionary tree for the data on pairwise species differences indicated in the following table:

OUT

A

B

C

B

4

 

 

C

4

2

 

D

 8

 8

 6

Problem  F.3

OUT

H

C

G

O

A

H

0

95

110

185

205

C

 

0

118

195

220

G

 

 

0

190

215

O

 

 

 

0

215

A

 

 

 

 

0

 Using the data as above, construct an un-rooted tree formulating the lengths of branches from the common ancestral node.

Problem F. 4

Neighbor-Joining Method

Given the following state of evolutionary distances, create a distance matrix on the resulting taxa

OUT

A

B

C

D

E

B

5

 

 

 

 

C

10

20

 

 

 

D

 15

25

 35

 

 

E

45

55

60

65

 

F

70

75

80

85

90

Hint:

Calculate the new distance matrix (m) for each pair of nodes.

m (i, j) = d(j) - [r(i)] + r(j)/(N - 2) where N is the number of taxa

Problem F.5

Trace the path for each sequence in HMM for the given MSA

AB - CDE

ABGCDE

AB - C- E

Hint:

HMMs and their variants have been used in gene prediction, pairwise and multiple sequence alignment, base-calling, modeling DNA sequencing errors, protein secondary structure prediction, ncRNA identification, RNA structural alignment,acceleration of RNA folding and alignment, fast noncoding RNA annotation, and many others.

Simulates a multiple sequence alignment of specified length. Deals with base-substitution only, not indels.

Request for Solution File

Ask an Expert for Answer!!
Other Engineering: Construct a dot-plot for the pair of sequences using the
Reference No:- TGS01286856

Expected delivery within 24 Hours