Homework Assignment #1
Due in-class on Tuesday, 10/11
Part 1
The goals of the first part of this assignment are to (i) ensure that
you understand the key concepts of DNA, RNA and proteins discussed in
the first lectures, (ii) gain an appreciation of the wealth of
molecular biology data available for computational analyses. In this
part of the assignment, you'll investigate a human gene called
phenylalanine hydroxylase that is also referred to by the
symbol PAH.
Start by going to the Genbank
entry for this gene. The genomic subsequence that includes the gene is
at the bottom of this page. The features table indicates the
coordinates (relative to the sequence listed) of various features,
such as exons. The CDS entry lists the coordinates of the
protein-coding part of the gene.
- Which chromosome is PAH on?
- What disorder is the gene involved in?
- How many exons does PAH have?
- How many bases are in the untranslated part of the first exon?
- What are the first three codons and the first three amino acids encoded in the first exon?
- Which stop codon is used?
Now go to the UniProt entry for the phenylalanine hydroxylase protein.
- How many amino acids are in the protein?
- What are the first three amino acids listed? (Do they match your answer above?)
- What are the coordinates of the first two helices in the protein's secondary structure?
Now go to the Protein Data
Bank "Molecule of the Month" description of
phenylalanine hydroxylase.
- How many molecules of phenylalanine hydroxylase interact to form a complex?
- Briefly describe what the function of the protein is, and how a malfunction in this protein leads to a disorder.
Now go to the Protein
Data Bank entry for the tetrameric protein structure. Try using
Jmol or one of the other viewers to visualize the protein structure.
- What does tetrameric mean?
The next stop is the Homologene entry
listing a set of genes believed to be homologous to PAH. Follow the
link to the pairwise alignment scores.
- What is the percentage of protein sequence identity for the mouse (M. musculus) homolog?
- What is the percentage of protein sequence identity for the fruitfly (D. melanogaster) homolog?
Select one other database (aside from those you've already consulted in this assignment) listed in the Nucleic Acid Research Database List that contains information about the PAH gene or protein.
- Briefly describe one other fact about PAH, and state which database you elicited this information from.
Finally, if you want to know a bit more about the disorder caused by
mutations in PAH, and the role of former UW professor Harry Waisman in
developing screening for the condition, you should read this article entitled Eating Cheese without Fear.
Part 2
The goal of the second part of this assignment is to review some of
some concepts we discussed in the lectures about sequence assembly.
- Given the following 3-mers, construct an overlap graph and show a
Hamiltonian path (of 7 edges) that visits each vertex exactly once.
Write the superstring corresponding to this Hamiltonian path.
{AGT, AAA, ACT, AAC, CTT, GTA, TTT, TAA}
- Does this path visit every edge of the graph?
- Given the following spectrum, show how to use the Eulerian path approach to find all possible sequences s such that Spectrum(s,3) = S.
S = {ATG, GGG, GGT, GTA, GTG, TAT, TGG}
- For the graph below, show how the Eulerian cycle algorithm we
discussed would find a cycle. Be sure to show the subcycles found and
how they are joined together at each step.