R: The function to run a single-study GSCA, differential co-expression (DC) analysis

singleDC {GSCA}

R Documentation

The function to run a single-study GSCA, differential co-expression (DC) analysis

Description

This function runs a single-study GSCA, differential co-expression (DC) analysis, described in Choi and Kendziorski (2009). The condition-specific gene-gene pairwise correlations are first calculated; then for each gene set defined in GSdefList, the dispersion index is calculated across condition-specific correlations.

Samples are randomly permuted across conditions for nperm times. Permutation-based p-values are calculated, based on the rank of observed DI among permuted index values.

Usage

singleDC(data, group, GSdefList, nperm, permDI = FALSE)

Arguments

`data`	A data matrix of rows representing genes and columns representing arrays. `rownames(data)` is used to subset a sub-matrix from `data` for each gene set. (Rows must be named by gene IDs used in `GSdefList`. For example, if `GSdefList` defines gene sets in Entrez Gene IDs, `rownames(data)` should be Entrez Gene IDs.
`group`	A numeric vector that specifies the number of arrays (columns) in each condition. For example, if `c(10, 5)` is provided, first 10 columns of the `data` matrix are used for one condition and the next 5 are used for the other condition.
`GSdefList`	A list of character vectors that define gene sets. Each entry of this list is a gene set.
`nperm`	The desired number of permutations.
`permDI`	TRUE/FALSE. If set TRUE, dispersion index values from permutation are saved and returned; if FALSE, permutation-based dispersion index values are not returned. Default is FALSE.

Details

Samples (columns) are permuted across conditions. For each permutation, condition-specific correlations are re-calculated based on permuted samples, and dispersion indices (DIs) are calculated based on those permutation-based correlations. As focus is on difference, the p-value for each gene set is calculated as:

p = sum(permutation DIs >= observed DI) / nperm .

Value

`DI`	The dispersion index vector for each gene set.
`pvalue`	The permutation-based p-value for each gene set.
`permv`	The permutation-based DI matrix, of `nperm` columns. The first column is identical to what is returned by `DI`.

Note

Currently, singleDC implements DC analysis for two conditions (e.g., tumor vs. normal) and three conditions (e.g., AA, AB, and BB genotypes). For three conditions, pairwise DIs are first calculated and averaged (internally).

Author(s)

YounJeong Choi

References

Choi and Kendziorski, submitted.

Examples

data(LungCancer3)
GS <- LungCancer3$info$GSdef
GSdesc <- LungCancer3$info$Name
dc.M <- singleDC(data = LungCancer3$data$Michigan, group = c(86, 10),
GSdefList = GS, nperm = 3, permDI = TRUE)

[Package GSCA version 1.1.0 Index]