Identifying Essential Genes in M. tuberculosis
by Random Transposon Mutagenesis

Karl Broman, PhD, Department of Biostatistics, Johns Hopkins University

Wednesday,November 12, 2003, 12:00 - 1:00 p.m.

1221 Computer Sciences and Statistics Center (1210 W. Dayton St.)


Mycobacterium tuberculosis (Mtb) is the organism which causes tuberculosis. Its circular genome of 4.4 Mbp has been completely sequenced and contains 4250 genes. In random transposon mutagenesis, one creates a library of mutants, each of which contains a single insertion of a transposon. Here we consider the Himar1 transposon, which inserts at random at a dinucleotide TA. The Mtb genome contains 74,403 such TA sites. We consider data on a library of 1425 transposon insertion mutants; for each mutant, the particular TA site at which insertion occurred has been determined. That a mutant with transposon insertion within a particular gene is viable indicates that the gene is not essential for the viability of the organism. Genes that are essential for the viability of the organism will never show up in such a library of insertion mutants.

We describe a Bayesian method for estimating the proportion of essential genes in the Mtb genome and for identifying genes likely to be essential, on the basis of such data. The prior distribution for the number of essential genes was taken to be uniform. A Gibbs sampler was used to estimate the posterior distribution.

