Homework Assignment #2
Due Tuesday, 10/19, 11:59pm
Substitution matrix
BLOSUM-62 substitution matrix file. You can assume that any substitution matrix given to your program will be in the same format as this file.
Sample input/output
You may wish to test your program with the following inputs/outputs. The output is the correct alignment given the BLOSUM-62 substitution matrix, g = -10 and s = -1.
- simple example: input, output
- calcium ion proteins: input, output
- yeast kinase proteins input, output
Additionally, we will test
your programs on several held-aside sequence pairs.
Sequences for problem #4
In HW #1, you assembled the genomic region of Enterobacter cloacae subsp. cloacae ATCC
13047 that contained the gene lacI. A file containing
the protein sequence encoded by this gene is below:
Your task is to locate this gene in fragments of three other genomes.
You should do this by using your AlignLocal program to locally
align the E. cloacae protein sequence against all six possible
translations of the genomic fragments given below. Use the blosum-62
substitution matrix given above, g = -10, and s = -1.
For each genome, identify which translation contains the orthologous
protein sequence. In addition, use your alignments to make a guess as
to which species is most closely related to E. cloacae.
- Escherichia coli K-12 MG1655:
genomic
fragment,
translation 1,
translation 2,
translation 3,
translation 4,
translation 5,
translation 6
- Dickeya dadantii 3937:
genomic
fragment,
translation 1,
translation 2,
translation 3,
translation 4,
translation 5,
translation 6
- Citrobacter koseri ATCC BAA-895:
genomic
fragment,
translation 1,
translation 2,
translation 3,
translation 4,
translation 5,
translation 6
Note: In order for you to use the blosum-62 matrix given above,
these translations were modified by replacing all stop codons with the
amino acid Alanine.