Incorporating Multiple CDNA Microarray Slide Scans into Statistical
Analyses -Application to Somatic Embryogenesis in Maize
Tanzy Love, Dept. of Statistics, Carnegie Mellon University
Alicia Carriquiry, Dept. of Statistics Iowa State University
Wednesday, December 7, 2005, 4:00pm
140 Bardeen
| ABSTRACT |
Microarray data are subject to multiple sources of measurement error. One source of potentially significant error is the settings of the instruments (laser and sensor) that are used to obtain the measurements of gene expression. Because `optimal' settings may vary from slide to slide, operators typically scan each slide multiple times and then choose the
reading with the fewest over-exposed and under-exposed spots. We discuss a somatic embryogenesis experiment carried out on \emph{Zea mays} at Iowa State University. The main objective of the study was to identify the set of genes in maize that actively participate in embryo development and to do so, embryo tissue was sampled and analyzed at various time periods and under different light conditions. We propose a hierarchical modeling approach to estimating gene expression that combines all available readings on each spot. The basic premise is that all readings contribute some information about gene expression and that after appropriate re-scaling, it would be possible to combine all readings into a single estimate. We assess the statistical properties of the proposed expression estimates using a simulation experiment. As expected, combining all available scans using a reasonable approach results in expression estimates with noticeably lower bias and root mean squared error relative to other approaches that have been proposed in the literature. We then revisit the maize experiment and present results obtained using a standard and the proposed approaches. We argue that more precise inferences on gene expression patterns are obtained when
all available scans on each spot are used in the statistical analyses thus resulting in increased power of tests. |