Skip to contents

Tests whether predefined gene sets (e.g., pathways or GO terms) are systematically enriched among genes ranked by differential expression, using the fgseaMultilevel algorithm from fgsea.

Usage

run.gsea(de.results, genes, pathways, min.size = 15, max.size = 500)

Arguments

de.results

Data frame as returned by run.limma.de(). Must contain a logFC column. Row names are probe IDs.

genes

Data frame of gene annotations as returned by extract.expression()$gene. Must contain columns "ID" for probe identifiers and "Gene symbol" for gene symbols matching the format used by pathway databases such as MSigDB.

pathways

Named list of character vectors, where each element is a gene set and names are pathway labels. MSigDB or GO gene sets in this format can be loaded via msigdbr.

min.size

Integer. Minimum number of genes required in a gene set for it to be tested. Default is 15.

max.size

Integer. Maximum gene set size. Default is 500.

Value

A data frame with one row per tested pathway, containing:

pathway

Gene set name.

pval

Nominal p-value.

padj

BH-adjusted p-value across all tested pathways.

NES

Normalized enrichment score. Positive values indicate enrichment among upregulated genes; negative values indicate enrichment among downregulated genes.

size

Number of genes from the pathway present in the ranked list.

leadingEdge

List-column of genes driving the enrichment score.

Details

Uses the fgseaMultilevel algorithm, which estimates p-values using an adaptive multilevel splitting Monte Carlo approach. This is more accurate than the original permutation-based fgseaSimple and does not require a fixed permutation count.

GSEA operates on the full ranked gene list from run.limma.de() rather than a filtered subset, preserving information from all genes. Probe IDs from the TopTable row names are resolved to gene symbols via the genes annotation data frame before matching against pathway gene sets. Probes with no gene annotation are silently dropped from the ranked list, as they cannot be matched to any pathway. The leadingEdge genes are strong candidates for follow-up with analyze.gene() or gene.analysis.plot().

Examples

# \donttest{
library(msigdbr)
geo <- extract.expression(load.geo.soft(accession = "GDS507",
                                                 log.transform = TRUE))
#> GDS507 not found locally, downloading from NCBI GEO...
#> Using locally cached version of GDS507 found here:
#> /tmp/RtmpxRZSjV/GDS507.soft.gz 
#> Using locally cached version of GPL97 found here:
#> /tmp/RtmpxRZSjV/GPL97.annot.gz 
hallmark.df <- msigdbr(species = "Homo sapiens", category = "H")
#> Warning: The `category` argument of `msigdbr()` is deprecated as of msigdbr 10.0.0.
#>  Please use the `collection` argument instead.
pathways <- split(hallmark.df$gene_symbol, hallmark.df$gs_name)
de.results <- run.limma.de(geo)
#> Removing intercept from test coefficients
gsea.results <- run.gsea(de.results, geo$gene, pathways)
head(gsea.results[order(gsea.results$padj), ])
#>                                       pathway         pval         padj
#>                                        <char>        <num>        <num>
#> 1:         HALLMARK_INTERFERON_GAMMA_RESPONSE 8.505567e-11 3.827505e-09
#> 2:               HALLMARK_ALLOGRAFT_REJECTION 2.282585e-07 5.135817e-06
#> 3:           HALLMARK_TNFA_SIGNALING_VIA_NFKB 2.844309e-06 4.266464e-05
#> 4:         HALLMARK_INTERFERON_ALPHA_RESPONSE 1.862513e-04 2.095328e-03
#> 5: HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 9.521288e-04 8.569159e-03
#> 6:                           HALLMARK_HYPOXIA 1.516218e-03 8.952530e-03
#>      log2err         ES       NES  size
#>        <num>      <num>     <num> <int>
#> 1: 0.8390889 -0.6451016 -2.551362    67
#> 2: 0.6901325 -0.6559719 -2.299225    38
#> 3: 0.6272567 -0.5518351 -2.126387    59
#> 4: 0.5188481 -0.5941171 -1.983752    32
#> 5: 0.4772708 -0.4610456 -1.814634    64
#> 6: 0.4550599 -0.4486926 -1.759598    63
#>                                     leadingEdge
#>                                          <list>
#> 1: ST8SIA4,IRF1,SAMHD1,SSPN,GBP4,SLAMF7,...[42]
#> 2:    ST8SIA4,TLR3,CTSS,EGFR,GBP2,BCAT1,...[15]
#> 3: IRF1,DUSP4,INHBA,EDN1,MARCKS,SLC16A6,...[36]
#> 4:    GBP2,IRF1,GBP4,RSAD2,CMPK2,SAMD9L,...[18]
#> 5:  NNMT,CTHRC1,ACTA2,EDIL3,INHBA,PRRX1,...[21]
#> 6:     ANGPTL4,CP,PDK1,NR3C1,ALDOA,EGFR,...[22]
# }