Some Project Ideas

  1. Implement a version of BLAST. Empirically evaluate the value of some aspect of the algorithm (e.g. the two-hit method).
  2. Implement and empirically compare several heuristic methods for multiple sequence alignment.
  3. Investigate a novel application of an optimization method to the task of multiple sequence alignment. Empirically compare the novel method to a standard one.
  4. Implement an interpolated Markov model for gene finding. Compare it to fixed-order Markov chain models.
  5. Implement profile HMMs and experiment with using them to model protein families.
  6. Implement and empirically compare several clustering algorithms on a gene expression data set.
  7. Implement and empirically compare several classification algorithms on a gene expression data set.
  8. Implement and evaluate an algorithm for inferring regulatory networks from gene expression data.
  9. Implement and experiment with an algorithm for protein threading.
  10. Implement and evaluate an algorithm for phylogenetic tree construction.
  11. Implement and experiment with using stochastic context free grammars to model some class of RNAs.
  12. Extend MUMmer's LIS algorithm to multiple sequence alignments

Some Available Data Sets

  1. gene expression data sets from Whitehead/MIT Center for Genome Research
  2. gene expression data sets from Stanford Microarray Database
  3. Paul Horton's protein localization data sets
  4. Pfam database of protein families
  5. human gene data set
  6. TRANSFAC database of transcription factor binding sites
  7. SCPD database of transcription factor binding sites
  8. extensive list of molecular biology databases
Last modified: Fri Apr 5 11:03:19 CST 2002