Software
During the course of my research in computational biology, I have written a fair amount of code that forms a couple of larger programs and numerous small utility programs. All of my code is available under the GNU General Public License and can be downloaded from this page. Included in my code are libraries written in C++ and Python that may be useful to others developing computational biology software. Many of the utility programs are inspired by Jim Kent's collection of programs written in C. Other software originating from my group or my collaborators are listed below.
- CellO (Cell Ontology-based classification): Classifies human cells assayed by scRNA-seq using the Cell Ontology and a training dataset of nearly all human, primary cell, bulk RNA-seq data in the Sequence Read Archive.
- DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation): Computes both reference-free and reference-based accuracy measures for de novo transcriptome assemblies.
- PSGInfer (Probabilistic Splice Graph Inference): Analyzes RNA-Seq data with probabilistic splice graph models of alternative RNA processing (transcription initiation, splicing, and polyadenylation).
- RSEM (RNA-Seq expression estimation by Expectation-Maximization): Estimates gene and isoform expression levels from RNA-Seq data with a statistical model that takes into account reads that map to multiple positions.
- CSEM (ChIP-Seq multi-read allocation using Expectation-Maximization): The ChIP-Seq sibling to RSEM. Using an EM-inspired heuristic, CSEM allocates reads from ChIP-Seq data sets, allocating reads that map to multiple positions fractionally.
- Mercator: Constructs multiple whole-genome orthology maps using exon anchors.
- Parametric alignment: Given two sequences to be globally aligned, these programs output all possible optimal alignments for all parameter values.
- Parametric inference of recombination: Given a multiple alignment of a putative recombinant with possible parental sequences, these programs output all possible parental annotations for all parameter values of a recombination HMM.
- FSA (Fast Statistical Alignment): A multiple alignment program for protein, RNA, and DNA sequences created by Robert Bradley. FSA produces alignments with the highest expected accuracies. I am on the development team and am working with Jae Young Do to parallelize aspects of the method. The FSA web server is available for aligning small sets of sequences. My development efforts may be tracked through my FSA git repository.


