All of these studiesrely on comparisons of nucleotide or amino acid sequences. In this tutorial, you will be introduced to some of the fundamental principles of molecular evolution and the types of bioinformatics tools that are used in evolutionary studies. We will begin by carrying out a manual sequence comparison, so that the basic concepts can be introduced, and the remainder of the project will be carried out at The Biology Workbench, a set of bioinformatics analysis programs managed by The San Diego Supercomputing Center at the University of California, San Diego. Objectives • To introduce the principles of molecular evolution • To acquaint you with the toolsthat are available to compare nucleotide and amino acid sequences • To learn about the use of protein sequencesin reconstructions of evolutionary history Project Branching evolution occurs when one ancestral species gives rise to two or more progeny species. However, speciation events don’t involve the vast majority of the genes in a genome. That is, for most genes, both of the progeny species inherit identical genes from the ancestor. Following speciation, these genes evolve independently in the separate lineages. Studies of molecular evolution therefore rely heavily on comparisons of related sequencesfrom different organisms. Shown below is an alignment of two homologous sequences that we will use as a starting place. Homologous sequences are sequences that have descended from a common ancestral sequence. You can’t meaningfully compare sequences unless they are homologous. This alignment uses the single letter amino acid code, in which G represents glycine, Q represents glutamine, etc. The aligned proteins have been shown to be involved in the metabolism of similar, but different, toxic compounds. As you can see, these amino acid sequences are very similar and it is easy to recognize that they are related by common descent. 2 dntAc: KMGVDDEVIVSRQNDGSVR nahAc: KMGIDDEVIVSRQSDGSIR An expanded version of this alignment is shown below. In this expanded alignment, both the amino acids and the corresponding DNA nucleotides are shown. For ease of analysis, the codons have been broken into separate entries in a table. Alignment of nahAc and dntAc sequences. K M G V D E V I V dntAc AAA ATG GGC GTC GAT GAA GTC ATC GTC nahAc AAA ATG GGT ATT GAC GAG GTC ATC GTC K M G I D E V I V S R Q N D G S V R dntAc TCC CGC CAG AAC GAT GGC TCG GTG CGA nahAc TCT CGG CAG AGC GAC GGT TCG ATT CGT S R Q S D G S I R This region was chosen at random to represent the changesthat take place in nucleotide sequences over time. Answer the questions below by manually comparing these sequences (this section is for your own understanding, you do not need to turn this in.) 1. Assuming that the dntAc sequence represents the ancestral sequence, how many nucleotide changes (mutations) have occurred in this region to create the nahAc nucleotide sequence? Remember that in actuality neither sequence represents the ancestral sequence. 2. Of these nucleotide changes, how many of these changed the amino acid encoded by that codon (i.e, how many were nonsynonymous changes)? 3. How many nucleotide changes were in the first codon position? How many of these altered the encoded amino acid? 4. How many nucleotide changes were in the second codon position? How many of these altered the encoded amino acid? 5. How many of the nucleotide changes were in the third codon position? How many of these altered the encoded amino acid? 6. Compare the % identity of these two sequences at the nucleotide vs protein 3 level. Percent identity is equal to (# of positionsin common / total # positions) * 100. Nucleotide % identity Amino acid % identity 7. Why is there a difference between amino acid % identity and nucleotide percent identity? If needed, a table of the single letter amino acid code can be found at: http://umber.sbs.man.ac.uk/dbbrowser/bioactivity/aacodefrm.html If needed, a codon table can be found at: http://www.pangloss.com/seidel/Protocols/codon.html The manual analysis that you just carried out introduced you to some of the ways that molecules evolve. The purpose of that manual analysis was to get you thinking about the mechanisms by which genes and proteins change over time, and the types of forces that control those changes. For example, when we do analyses of this type we almost always see many more changesin the 3rd position of codons than in the first position. Why is this? Do you think that these nucleotides mutate at a higher rate than nucleotidesin the first position? What else might be responsible for this phenomenon?