Skip to contents

Performs genome-wide differential expression analysis across all probes using the limma linear modeling framework. This is the recommended approach for microarray data from GEO, as it borrows variance information across genes to produce more stable estimates than gene-by-gene t-tests.

Usage

run.limma.de(geo, condition.col = "disease.state", adjust.method = "BH")

Arguments

geo

Named list as returned by extract.expression(), containing at minimum $expression (probe x sample matrix) and $phenotype (sample metadata data frame).

condition.col

Character. Name of the column in geo$phenotype to use as the grouping variable. Default is "disease.state".

adjust.method

Character. P-value correction method passed to limma::topTable(). Default is "BH" (Benjamini-Hochberg FDR).

Value

A data frame (TopTable) with one row per probe, sorted by evidence of differential expression, containing:

logFC

Log2 fold-change between conditions.

AveExpr

Average log2 expression across all samples.

t

Moderated t-statistic.

P.Value

Raw p-value.

adj.P.Val

FDR-adjusted p-value (BH by default).

B

Log-odds that the gene is differentially expressed.

Details

Unlike running analyze.gene() on each probe individually, limma fits a linear model to all probes simultaneously and applies empirical Bayes shrinkage to the variance estimates. This stabilizes results for genes with few observations and is the standard method for microarray DE analysis.

The returned TopTable is the primary input for downstream functions such as volcano.plot() and run.gsea(), and can be filtered by adj.P.Val and logFC thresholds to define a gene signature.

Examples

# \donttest{
geo <- extract.expression(load.geo.soft(accession = "GDS3268", log.transform = TRUE))
#> GDS3268 not found locally, downloading from NCBI GEO...
#> Using locally cached version of GDS3268 found here:
#> /tmp/RtmpxRZSjV/GDS3268.soft.gz 
#> Warning: NaNs produced
#> Using locally cached version of GPL1708 found here:
#> /tmp/RtmpxRZSjV/GPL1708.annot.gz 
de.results <- run.limma.de(geo, condition.col = "disease.state")
#> Warning: Partial NA coefficients for 2141 probe(s)
#> Removing intercept from test coefficients
head(de.results)
#>            logFC   AveExpr         t      P.Value    adj.P.Val         B
#> 16982  0.5350351 -1.152244  6.134758 4.381867e-09 0.0001027521 10.435447
#> 13087 -1.5601595 -3.766461 -6.318251 5.868863e-09 0.0001027521 10.059365
#> 5334   0.8320762 -1.946530  6.013908 1.160081e-08 0.0001354047  9.512088
#> 13277  0.9150156 -1.580899  5.582388 7.648712e-08 0.0006695683  7.737907
#> 6416   0.9928177 -2.532741  5.493257 1.281664e-07 0.0007491880  7.258928
#> 14638  1.4593609 -3.346868  5.530253 1.283735e-07 0.0007491880  7.253982
# }