In this exercise we consider a dataset of amino acid pairs (x, y). We think of the data as representing the outcome of a random variable (X, Y). Here X and Y represent an amino acid at a given position in two evolutionary related proteins (same protein, but from two different species, say). The dataset may be obtained from a (multiple) alignment of (fractions) of proteins. The only mutational event is substitution of one amino acid for another. In this exercise you are allowed to think of the different positions as independent, but X and Y are dependent.
- Load the aa dataset into R with data(aa) and use table to cross-tabulate the data according to the values of x and y. Compute the matrix of relative frequencies for the occurrence of all pairs (x, y) for (x, y) ∈ E0× E0where E0denotes the amino acid alphabet.