Summary by Lisa Chung 11/19/2008 ******************************** * Data and Analysis Methods * ******************************** - Data: HPV dataset with 5 tissue types (NORMAL-CIN1-CIN2-CIN3-CANCER) Here, we combine CIN1 and CIN2 as an EARLY tissue type ------------------------------- | NORMAL EARLY CIN3 CANCER | | 24 36 40 28 | : total of 128 arrays ------------------------------- - Feature selection: 1) 2 way ANOVA (tissue type and batch effect with no interaction) is performed. 2) Benjamini-Hochberg correction used for p-value adjustment, cutoff = 0.05 3) Out of 54675 probesets, *6663 probesets* show significant difference in gene expression across tissue type. - EB calculation: 1) with all possible 51 patterns (w/o null), apply emfit with 20 iterations, estimate model parameters 2) with all possible 540 ordered structures (w/o null), apply emfit with 0 and 100 iterations. 3) After the iteration, compute probabilities of having each structure given the expression data for every gene (Pr(structure|data)). - Structure Assignment rule: 1) maximum assignment rule Assign a structure with maximum Pr(structure|data) 2) cutoff = 0.9 Assign a structure if maximum Pr(structure|data) is greater than 0.9 (more stringent..) 3) After structure assignment based on EB probabilities, I removed a few probesets if their average log2 expression aren't consistent with the underlying structures. - Gene Enrichment Calculation: 4-set analysis set1: Normal < (Early, CIN3, CANCER) or Normal > (Early, CIN3, CANCER) i.e. (1,2,2,2) or (2,1,1,1) set2: (Normal, Early) < (CIN3, CANCER) or (Normal, Early) < (CIN3, CANCER) i.e. (1,1,2,2) or (2,2,1,1) set3: (Normal, Early, CIN3) < CANCER or (Normal, Early, CIN3) < CANCER i.e. (1,1,1,2) or (2,2,2,1) set4: Normal < Early < CIN3 < CANCER or Normal > Early > CIN3 > CANCER i.e. (1,2,3,4) or (4,3,2,1) set1, set2, and set3 are group of genes which change *only* at each transition point. I collected probesets for each structure by *maximum assignment rule from 100 EM* iterations. I removed a few probesets whose mean expression levels are not consistent with structure assignment. Set probeset score equals to 1 if the probeset is in each set, otherwise set 0. Use this binary score for gene enrichment calculation by allez package. I listed interesting GO/KEGG pathways with z.score > 4 and number of genes > 4. ************* * Files: * ************* Average log2 expression and structure assignment for all of 6663 probesets: All.dataTable.txt (* in structure assignment, Unassigned: unassigned by low probabilities Unassigned.by.fc: unassigned due to inconsistency bet/n structure and average expression) Gene Enrichment calculation result (That I showed on last meeting, 11/06): GeneEnrichment-4sets.xls List of genes in each set (That I showed on last meeting, 11/06): geneList-4sets.xls Trajectory plots: (based on raw (unlogged) scale) trajectory-4-set-max.png, trajectory-4-set-9.png: trajectory plots for 4-set analysis with max. rule and with cutoff = 0.9, respectively trajectory-top20-max.png, trajectory-top20-9.png: trajectory plots for top 20 structures with max. rule and with cutoff = 0.9, respectively