The document “Sequence” contains a sequence labeled “Mystery_sequence_1”. Identify this sequence using BLAST.
1. How long was your query? How many sequences did BLAST search in the database?
2. What is the accession number of your best hit?
3. From what organism did this DNA sequence come?
4. Does this sequence contain introns? Explain you answer.
5. Briefly describe how you found the answers to questions 1-3.
6. What is the name of the researcher who submitted this sequence to GenBank? In what year was this sequence first submitted to GenBank?
7. Find one journal article directly associated with this sequence and list the name of the journal, the year of publication, and the lead author’s name.
Now take only the first two lines of the same mystery sequence and use it as a BLAST query.
8. What is the accession number of the best hit? Is the identity (not percent identity) of the best hit different from when you used the complete nucleotide sequence?
9. What is the E-value of your best hit? How does it compare to the E-value you received from blasting the full-length sequence?
10. Based on the two BLAST searches you’ve just completed, what is the relationship between query length and your confidence in the outcome of the search? Explain your answer.
The document “Sequence” contains a sequence labeled “Mystery_sequence_2”. Identify this sequence using BLAST.
11. What is the accession number of your best hit? How long is your best hit?
12. Specifically, from what organism and genome did this sequence come? From where (geographically) did the organism come?
13. List the names (not the sequences) of all of the protein coding genes present in the published sequence you identified in question 11. You can use the alphanumerical abbreviations (example: rpb2) instead of writing out the full names if you prefer.
14. Describe how you found the answer to question 13.
15. One of the protein coding regions you listed in question 13 encodes a protein product 459 amino acids in length. What is the complete name of this 459 amino acid protein?
16. Briefly describe how you found the answer to question 15.
Create a text file you can use for multiple alignment using
a) the 459 amino acid protein you identified in question 15,
b) the sequence of the homologous protein in the most closely related living species to the organism in question 12,
c) the sequence of the homologous protein in chimpanzee, dog, cow, rat, zebrafish, African clawed frog and fruit fly. You should have 9 sequences total.
17. What are the scientific names of the species you found for your alignment above? escribe how you found them.