Protein BLAST and Motifs - Homework
Purpose: Become familiar with Protein BLAST queries of the NCBI non-redundant databases, using the human acyl-CoA binding protein as a query. To examine the range of BLAST hits, we will restrict the output to C. elegans sequences. We will examine the BLAST alignment outputs for a number of the top BLAST hits and relate the alignments to the amino acid conservation that is obvious from multiple alignment of the human, bovine, duck and yeast ACBP family members.
Examine this alignment and identify recognizable conserved motifs which you will use as the basis for determining homology of the C.elegans sequences with the ACBP protein family. Initially you will evaluate the C.elegans hits by eye, looking for conservation of your chosen motifs. Provide your assessment and indicate your reasons. Then you will take each C.elegans sequence you believe has homology to the ACBP family and do a new protein BLAST search against the Conserved Domain database (you can do this analysis as a protein BLAST search against the non-redundant database, since your protein will be automatically searched against the Conserved Domain database - or you can link to the Conserved Domain database and do the search directly from the links at this site.) You will do two different analyses using the information from the Conserved Domain search. First, reevaluate your assessment of the homology of the C. elegans sequences to the ACBP family. Second, determine whether these C.elegans sequences have any other motif recognized by the conserved domain search. You will then use the multiple repetition PSI-BLAST search, which computes an ACBP-specific scoring matrix based on the aligned ACBP sequences, in an attempt to identify additional C. elegans sequences that are distantly related to the ACBP family. Using all this information you will determine whether C.elegans has an ortholog of human ACBP, providing your criteria. In addition, you will determine how many of the C.elegans sequences contain a recognizable ACBP domain. From this make a guess as to the evolution of the C.elegans ACBP family. Finally, you will use PHI-BLAST, to combine a pattern search (A pattern you have determined from the alignment linked in question 7) with BLAST to again search the non-redundant protein database (limited to C.elegans sequences) to evaluate how PHI-BLAST can be used to detect distant similarities. The first extra credit question will test your ability to compare BLAST results for ACBP family members in humans and drosophilia to determine the evolutionary history of ACBP in these organisms and whether they differ from the C.elegans ACBP family members. The second extra credit question will introduce you to phylogenetic analysis.
1. Retrieve the Acyl-CoA Binding protein (human) sequence (87 amino acids) from the class sequence link or this link and do a standard Protein BLAST search against the NCBI Non-Redundant protein database. IMPORTANT- limited for only the Caenorhabiditis elegans sequences. (In the organism box, start typing "caen" and then choose Caenorhabiditis elegans from the list. - your analysis should only find a restricted number of C. elegans hits). Prepare a table giving the accession numbers, number of amino acids in the protein and Evalues for the top 10 BLAST hits. How many ACBP family members are detected? What was your criteria? (Evalue score? Sequence title?
2. Examine the alignments of the top 10 C.elegans hits resulting from your search above. Using your knowledge of the conservation of motifs in the ACBP gene family (see ACBP conservation alignment) determine whether any of the alignments reveal homology to the conserved ACBP motifs (Two motifs YxxYKQA and KWxAW are provided in the ACBP alignment) ? In your table in question 1, add a column for each motif and indicate whether either either motif is found in these 10 C.elegans sequences (Y/N/?). How many ACBP-related sequences do you think are in C.elegans?
3. Take each of the C. elegans in question 2 that you have determined to be related to ACBP (those labeled with a Yes in your table) and determine whether they have a ACBP motif that is detected by the Conserved Domain Search (use each sequence in a BLAST search against the Conserved Domain database). Add a column in your table above (BLAST-CD) and indicate whether there is a conserved ACBP domain detected by BLAST (Y/N). How does this compare to your analysis of the sequence alignments? Do any of these proteins have
more than one recognizable functional domain (The BLAST search against the CD database will show you this)? Add a column to your table (Other domains) and indicate the domain name. Do the same analysis for the remaining BLAST hits of the top 10 that you determined in question 1 and 2 to not be related to ACBP. Indicate the domains detected by BLAST for these sequences and indicate these results in your table (ACBP? Other?). How many ACBP-related sequences do you now think are in C.elegans?
4. Use the ACBP human protein sequence to do a PSI BLAST search of the nonredundant protein database, limiting to C.elegans.. The PSI BLAST results will select all protein matches above a selected E-value to be used in determining a new ACBP-specific scoring matrix. Use the default setting and perform the second iteration of the PSI-BLAST search. Add a new column to your table (PSIE) and indicate the new E values determined for your top 10 C.elegans hits in your table. Is there a new C.elegans sequence that has a significant E-value that
was not readily detected in the original protein BLAST search? From this analysis, how many C.elegans sequences do you think are related to human ACBP?
5. Does C. elegans have an ortholog of ACBP? State your reasons and provide your criteria and ID. An Evalue alone is not sufficient.
6. Provide a short explanation regarding the evolutionary history of the ACBP homologs in C. elegans, discussing gene duplication, conservation of structure and function, and structural similarities between homologs -only based on the data from your analysis.
7. Determine an amino acid pattern that is conserved in the N-terminal domain of the ACBP orthologs in the linked alignment which is also found in your "top" choice of the ACBP ortholog in c.elegans (CLICK HERE for alignment). What is your pattern? Do a PHI-BLAST search using the human ACBP as the probe, and insert your pattern to be searched. Limit the search to C. elegans sequences. How many of the C.elegans ACBP-like sequences contain your pattern? Would you consider this pattern specific to ACBP orthologs? Do a PHI-BLAST search with this pattern using the C.elegans ACBP "ortholog" as a probe, limiting your search to human sequences. How many "different" (ie coming from different genes/chromosomes) human sequences match this pattern? Discuss, especially in
relationship to orthology between the human and c.elegans sequences.
8. Extra credit - can you determine a pattern that is found in more than 4 of the C.elegans ACBP-like sequences? What is it? Do a PHI-BLAST search with the pattern and one of these four sequences as query, limiting your search to C.elegans. How many C.elegans ACBP-like sequences is it found in. Do the same search but limit to human sequences. Are there any human ACBP-like sequences that contain this pattern. Provide one example.