Assignment: Introduction to Computer Science for Life Scientists
Exercise 1 (Dynamic Programming)
Searching for the longest common subsequence (LCS) of biosequences is one of the most important tasks in Bioinformatics. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example given a sequence S =\AGGTACCCGATC", all \AGT", \TAGAT", \GGG", and ... are subsequences of S.. In Bioinformatics the length of the longest commen subsequence between two biosequences is a good measure for the similarity of the two sequences.
a) Write a recursive function to and the LCS of two input string of size n and m. (3P)
b) For two sequences of size n, what is the running time complexity of your function in part a), where all possible subsequences are computed for both sequences? Brie y explain your answer. (2P)
c) This time use dynamic programming to nd the LCS of two input string of size n and m. (5P)
d) Write a python function that uses your idea in part c) to nd the LCS of two input strings S1 and S2. Test your function with the following sequences: (3P)
S1=\AGGTACCCGATC", S2=\GTAAGAGTACCGATTGATC"
S1=\AGATTCCCCACCTTAGA", S2=\AGGAATCCCCAACCTGA"
e) Briefly explain what is the running time complexity of LCS function in part d)? (2P)
Exercise 2 (Sorting)
a) Suggest an algorithm with O(n) average running time complexity to nd the ith smallest value in set S of size n. For example for S = f5; 3; 1; 7; 2; 3; 1g the third smallest element is 2. You are not allowed to use counting or radix sort. Hint: Think how you could make use of the idea of quick sort for searching. (5P)
b) Suggest an algorithm with O(nk) running time complexity to sort a list of n strings, where the strings may have dierent lengths with k being the length of the longest string. For example for [\b",\cda",\db",\da",\ab",\a"], the sorted list must be [\a",\ab",\b",\cda",\da",\db"]. (5P).