MADGiC: a Model-based approach for identifying Driver Genes in Cancer


We know that cancer arises from the accumulation of genetic alterations that provide a selective advantage to a cancer cell (drivers), but identifying which changes will provide that advantage is a difficult and open problem. Alterations that are irrelevant to the disease process (passengers) will occur by chance and the key challenge is to be able to separate these two classes of alterations. We have developed a statistical method to address this problem.

As we detail in the paper below, existing statistical methods for identifying driver genes in cancer rely primarily on frequency-based criteria (i.e. identifying driver genes as those showing higher mutation rates than expected by chance). However, recent studies have identified many other properties of drivers such as increased functional impact, enrichment for specific mutations, and highly structured spatial patterns that have not yet been utilized in statistical approaches. Our approach incorporates all three of these criteria and in doing so shows substantially increased power (with a well controlled false discovery rate) over competing methods in simulation studies.

The R package `MADGiC' fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver. The model accounts for (1) frequency of mutation compared to a sophisticated background model that accounts for gene-specific factors in addition to mutation type and nucleotide context, (2) predicted functional impact (in the form of SIFT scores) of each specific change, and (3) positional patterns in mutations that have been deposited into the COSMIC (Catalogue of Somatic Mutations in Cancer) database. Example data from the The Cancer Genome Atlas (TCGA) project ovarian cohort is provided.

R Package:

Latest version: MADGiC_0.2.tar.gz and manual. Accomodates user-specified expression and replication timing data. See 'get.post.probs' function for details.

Previous versions: MADGiC_0.1.tar.gz and manual.

Contact: kdkorthauer at wisc dot edu; kendzior at biostat dot wisc dot edu

Details of the approach can be found in the paper: Korthauer, K. and Kendziorski, C. MADGiC: a model-based approach for identifying driver genes in cancer. Submitted.

Last Modified September 2014