During the course of my research in computational biology, I have written a fair amount of code that forms a couple of larger programs and numerous small utility programs. All of my code is available under the GNU General Public License and can be downloaded from this page. Included in my code are libraries written in C++ and Python that may be useful to others developing computational biology software. Many of the utility programs are inspired by Jim Kent's collection of programs written in C. Among the programs made available here are:
- RSEM (RNA-Seq expression estimation by Expectation-Maximization): Estimates gene and isoform expression levels from RNA-Seq data with a statistical model that takes into account reads that map to multiple positions.
- CSEM (ChIP-Seq multi-read allocation using Expectation-Maximization): The ChIP-Seq sibling to RSEM. Using an EM-inspired heuristic, CSEM allocates reads from ChIP-Seq data sets, allocating reads that map to multiple positions fractionally.
- Mercator: Constructs multiple whole-genome orthology maps using exon anchors.
- Parametric alignment: Given two sequences to be globally aligned, these programs output all possible optimal alignments for all parameter values.
- Parametric inference of recombination: Given a multiple alignment of a putative recombinant with possible parental sequences, these programs output all possible parental annotations for all parameter values of a recombination HMM.
- FSA (Fast Statistical Alignment): A multiple alignment program for protein, RNA, and DNA sequences created by Robert Bradley. FSA produces alignments with the highest expected accuracies. I am on the development team and am working with Jae Young Do to parallelize aspects of the method. The FSA web server is available for aligning small sets of sequences. My development efforts may be tracked through my FSA git repository.