Homework Assignment #3
Due in class 12/6
Part 1: Hidden Markov Models
- What is the distribution of sequence lengths, P(l),
generated by the Markov chain below?
- For the HMM shown below, determine the probability,
P(x), of the sequence x = CGTCAG using the Forward
algorithm. Show your work by providing all values in the dynamic
programming matrix.

- Using the same HMM and sequence, determine the probability
P(x) using the Backward algorithm. Show your work by
providing all values in the dynamic programming matrix.
- Using the same HMM and sequence, determine the most likely path through the model using the
the Viterbi algorithm. Show your work by
providing all values in the dynamic programming matrix.
- Using the values you calculated via the Forward and Backward
algorithm, determine the probabilities P( πi = 1 | x),
for i = 1...6 (i.e. the probability that state 1 emits each
character in the sequence).
Part 2: Distance-Based Phylogeny
- Given the following distance data for five species, show how
UPGMA would produce a phylogenetic tree for these species. Show the
partial tree at each step of the algorithm and indicate the distances
represented by edges in the final tree.

- Given the following distance data for five species, show how
neighbor joining would produce a phylogenetic tree for these species. Show the
partial tree at each step of the algorithm and indicate the distances
represented by edges in the final tree.

- Is the data set from question 7 ultrametric? Is it additive? Justify your answers.