Model selection for regression on right censored outcomes

Sunduz Keles, Ph.D. Candidate, UC Berkeley
Wednesday, February 26, 2003, 4-5 p.m.

1221 Computer Science and Statistics Center, 1210 W. Dayton St.


Model selection is an incredibly important component of high dimensional statistical modeling. The recent utilization of microarray technology in cancer research is generating data sets with patient survival information and gene expression for thousands of genes and hence increasing attention to this statistical topic. Often these data sets are subject to right censoring elevating the challenge in selecting a good predictive model. In this talk, I will present a new cross-validation based model selection method for selecting among a set of given predictors of the right censored survival outcome. The central idea of the method lies in treating the risk of a given predictor as a full data parameter and estimating it based on the observed data with cross validation. I will show that under appropriate conditions, the proposed method performs as well as the oracle procedure in model selection, namely the method is asymptotically equivalent to selecting the best predictor using the true data generating distribution of the full data. I will also illustrate this result with simulations in the context of histogram regression with right censored data.

