Statistical Methods for Network-based Analysis of Genomic Data

Hongzhe Li
Department of Biostatistics and Epidemiology
University of Pennsylvania School of Medicine

Friday, April 20
12:00 pm
5275 MSC



A central problem in genomic research  is the identification of
genes and pathways involved in diseases and other biological
processes. Many methods have been developed for identifying genes
in a regression framework.  The genes identified are often linked
to known biological pathways through gene set enrichment analysis
in order to identify the pathways involved. However, most of the
procedures for identifying the biologically relevant genes do not
utilize the known pathway information. In this talk, I present two
network-based approaches for genomic data analysis: a
pathway-based regression analysis using a group gradient descent
boosting procedure for identifying pathways and a Markov random field (MRF)-based method for identifying genes and subnetworks that are related to diseases. Simulation studies indicated that the method  is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity and lower false discovery rates than the commonly used procedures that do not use the pathway structure information. Applications to two breast cancer microarray gene expression datasets identified  several subnetworks on several of the  KEGG transcriptional  pathways that are related to breast cancer recurrence or survival due to breast cancer. Extension to analysis of time course gene expression data will also be discussed.

