Problem
Maximum parsimony is not consistent on trees with certain characteristics. The following simple tree for four species (A, B, C, and D) can be used as an example to illustrate the problem. We will assume that this tree represents the true relationship between the four species. The branch lengths represent the probability that the characters at the two ends of the branch are different. We will assume for simplicity that there are only two possible characters, 0 and 1.
I. What are the possible unrooted trees that maximum parsimony could return? Sketch them below and name them T1, T2 and T3.
II. Label each tree with the character pattern that supports that particular tree. Recall that the character pattern xxyy corresponds to species A and B having one character, while C and D have another, e.g. either A=0, B=0, C=1, D=1 or A=1, B=1, C=0, D=0.
III. Next, write out the probability of observing each of the three possible informative character patterns for the provided tree in terms of p and q.
IV. Under the assumption that p and q are both very small, we can approximate these three probabilities as follows: p, 2pq and p2 + q2. Write out which of these corresponds to each of your three character patterns, and explain your reasoning.
V. We have assumed that p and q are both very small. Now, in addition, assume that p is extremely small compared to q2, that is, q2 >> p. Looking at the three expressions that you obtained in part (d), explain which of the three probabilities is largest and thus which of the three character patterns is most likely to occur.
VI. Explain in words how this shows that atleast under certain circumstances, maximum parsimony is not consistent, and explain what those circumstances are. (In your answer, be sure to explain the biological significance of the condition that q2 >> p.)