Homework Assignment #3

Due in class 12/6

Part 1: Hidden Markov Models

  1. What is the distribution of sequence lengths, P(l), generated by the Markov chain below?

    DNA
  2. For the HMM shown below, determine the probability, P(x), of the sequence x = CGTCAG using the Forward algorithm. Show your work by providing all values in the dynamic programming matrix.
    DNA
  3. Using the same HMM and sequence, determine the probability P(x) using the Backward algorithm. Show your work by providing all values in the dynamic programming matrix.

  4. Using the same HMM and sequence, determine the most likely path through the model using the the Viterbi algorithm. Show your work by providing all values in the dynamic programming matrix.

  5. Using the values you calculated via the Forward and Backward algorithm, determine the probabilities P( πi = 1 | x), for i = 1...6 (i.e. the probability that state 1 emits each character in the sequence).

Part 2: Distance-Based Phylogeny

  1. Given the following distance data for five species, show how UPGMA would produce a phylogenetic tree for these species. Show the partial tree at each step of the algorithm and indicate the distances represented by edges in the final tree.
    DNA
  2. Given the following distance data for five species, show how neighbor joining would produce a phylogenetic tree for these species. Show the partial tree at each step of the algorithm and indicate the distances represented by edges in the final tree.
    DNA
  3. Is the data set from question 7 ultrametric? Is it additive? Justify your answers.