Performs a comprehensive statistical analysis comparing gene expression between two groups. Integrates parametric and nonparametric testing, effect size estimation, bootstrapped confidence intervals, and a heuristic biological relevance assessment.
Arguments
- df
A data frame as produced by
build.analysis.df(), containing at least two columns:- expression
Numeric vector of gene expression values.
- group
Character or factor vector with exactly two group labels.
When the data frame contains multiple genes (i.e., a
genecolumn with more than one unique value), expression values from all genes are pooled together. Subset the data frame to a single gene before calling this function for per-gene results.- alpha
Numeric. Significance threshold. Default is
0.05.- n.boot
Integer. Number of bootstrap resamples for the confidence interval. Default is
1000.
Value
A named list with nine elements:
- p.value
Numeric. P-value from the adaptive t-test.
- effect.size
Numeric. Cohen's d.
- effect.size.class
Character. Qualitative magnitude label from
classify.effect.size().- confidence.interval
Named list with
lowerandupperbounds for the bootstrapped CI around Cohen's d.- nonparametric.p
Numeric. P-value from the Wilcoxon rank-sum test.
- robustness
Character. Agreement assessment between parametric and nonparametric tests.
- biological.relevance
Character. Heuristic relevance label from
flag.biological.relevance().- interpretation
Character. Human-readable summary combining all of the above.
- raw
List. Raw output from the parametric and nonparametric tests for further inspection.
Details
This function integrates multiple statistical perspectives to provide a
more complete picture of group differences than a p-value alone. The
result is designed to feed directly into plot.gene.analysis() for
visualization.
Examples
# \donttest{
geo <- extract.expression(load.geo.soft(accession = "GDS3268", log.transform = TRUE))
#> GDS3268 not found locally, downloading from NCBI GEO...
#> Warning: NaNs produced
probe <- find.probe.by.gene(geo$gene, "mucin 20, cell surface associated")
expr <- get.gene.expression(geo$expression, probe)
analysis.df <- build.analysis.df(expr, geo$phenotype, geo$gene)
result <- analyze.gene(analysis.df, n.boot=100)
cat(result$interpretation)
#> The difference is not statistically significant with a negligible effect size.
#> Result is no consistent evidence of difference.
#> No strong evidence of biological relevance.
# }