Combines a multi-gene expression matrix with sample metadata into a long-format, analysis-ready data frame. Each row represents one gene-sample observation, with human-readable gene names resolved from the annotation data frame.
Arguments
- expr.matrix
Numeric matrix of gene expression values as returned by
get.gene.expression(). Rows are probe IDs, columns are sample IDs.- phenotype
Data frame of sample metadata as returned by
extract.expression()$phenotype. Row names must correspond to the column names ofexpr.matrix.- genes
Data frame of gene annotations as returned by
extract.expression()$gene. Row indices must correspond to the probe IDs (row names) ofexpr.matrix. Used to resolve probe IDs to the human-readable gene names stored in the"Gene symbol"column.- group.col
Character. Name of the column in
phenotypeto use as the grouping variable. Default is"disease.state".
Value
A long-format data frame with one row per gene-sample pair and three columns:
- gene
Character. Human-readable gene name resolved from the gene annotation data frame.
- expression
Numeric. Expression value for that gene-sample pair.
- group
Character or factor. Group label for each sample, renamed from the
group.colcolumn inphenotypefor consistency with downstream functions.
Rows with NaN or NA expression values are removed.
Details
The function pivots expr.matrix from wide format (probes x samples)
to long format, merges sample metadata by sample ID, resolves probe IDs to
gene names using the annotation data frame, and selects the three columns
needed for analysis. The output is the standard input for
analyze.gene(), gene.analysis.plot(), and fit.lasso().
Requires tidyr for the pivot step. Ensure tidyr is listed
under Imports in the package DESCRIPTION.
Examples
# \donttest{
geo <- extract.expression(load.geo.soft(accession = "GDS3268", log.transform = TRUE))
#> GDS3268 not found locally, downloading from NCBI GEO...
#> Using locally cached version of GDS3268 found here:
#> /tmp/RtmpxRZSjV/GDS3268.soft.gz
#> Warning: NaNs produced
#> Using locally cached version of GPL1708 found here:
#> /tmp/RtmpxRZSjV/GPL1708.annot.gz
probe <- find.probe.by.gene(geo$gene, c("MUC20", "ADH1A"))
expr <- get.gene.expression(geo$expression, probe)
df <- build.analysis.df(expr, geo$phenotype, geo$gene)
head(df)
#> [1] gene expression group
#> <0 rows> (or 0-length row.names)
# }