The prior parameters incorporate information from all genes which allows us to shrink/nudge the gene-specific variances toward a common consensus
they are estimated from the data - the formulas for \((s_0, d_0)\) and their derivations are beyond the scope of the course, but limma takes care of the details for us
note that \((s_0, d_0)\)do not depend on\(g\)
Preview: limma workflow
Functions that make your life easier
Function
Description
model.matrix
Takes in your data frame and makes a design matrix
limma::lmFit
Fits the linear model to each gene separately – replace gene with “feature” depending on your data (‘industrial scale’ lm)
limma::eBayes
Use output of linear regression to compute moderated t statistics
limma::topTable
Query your results; sort your p-values; sort genes; Adjust for multiple comparisons
limma::decideTests
Identify which genes are significantly differentially expressed for each contrast
Within each gene observations are iid / constant variance
lmFit() carries out multiple linear regression on each gene
Usage: lmFit(object, design)
object is a data.frame or matrix with features in rows and samples in columns (G genes by N samples), or other Bioconductor object such as ExpressionSet
design is a design matrix (output of model.matrix(y ~ x); N samples by \(p\) parameters)
title geo_accession
GSM92610 Nrl-ko-Gfp 4 weeks retina replicate 1 GSM92610
GSM92611 Nrl-ko-Gfp 4 weeks replicate 2 GSM92611
GSM92612 Nrl-ko-Gfp 4 weeks replicate 3 GSM92612
GSM92613 Nrl-ko-Gfp 4 weeks replicate 4 GSM92613
GSM92614 Nrl-ko-Gfp E16 replicate 1 GSM92614
GSM92615 Nrl-ko-Gfp E16 replicate 2 GSM92615
status submission_date last_update_date type
GSM92610 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
GSM92611 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
GSM92612 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
GSM92613 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
GSM92614 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
GSM92615 Public on Feb 28 2006 Jan 16 2006 Aug 28 2018 RNA
channel_count source_name_ch1 organism_ch1
GSM92610 1 Gfp purified cell Mus musculus
GSM92611 1 purified Gfp+ photoreceptor cells Mus musculus
GSM92612 1 Gfp+ photoreceptor cells Mus musculus
GSM92613 1 Gfp+ photoreceptor cells Mus musculus
GSM92614 1 Gfp+ photoreceptor cells Mus musculus
GSM92615 1 Gfp+ photoreceptor cells Mus musculus
characteristics_ch1 molecule_ch1
GSM92610 Nrl-ko, purified Gfp+ photoreceptor cell total RNA
GSM92611 Nrl-ko-Gfp, Gfp+ photoreceptor cells total RNA
GSM92612 Nrl-ko-Gfp, Gfp+ photoreceptor cells total RNA
GSM92613 Nrl-ko-Gfp, Gfp+ photoreceptor cells total RNA
GSM92614 Nrl-ko-Gfp, Gfp+ photoreceptor cells total RNA
GSM92615 Nrl-ko-Gfp, Gfp+ photoreceptor cells total RNA
extract_protocol_ch1 label_ch1 label_protocol_ch1
GSM92610 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
GSM92611 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
GSM92612 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
GSM92613 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
GSM92614 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
GSM92615 mRNA was amplified using Nugene kit Biotin Nugene kit protocol
taxid_ch1 hyb_protocol scan_protocol
GSM92610 10090 Affymetrix standard Affymetrix standard
GSM92611 10090 Affymetrix standard Affymetrix standard
GSM92612 10090 Affymetrix standard Affymetrix standard
GSM92613 10090 Affymetrix standard Affymetrix standard
GSM92614 10090 Affymetrix standard Affymetrix standard
GSM92615 10090 Affymetrix standard Affymetrix standard
description
GSM92610 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
GSM92611 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
GSM92612 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
GSM92613 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
GSM92614 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
GSM92615 Nrl-ko mice were mated with wt-Gfp mice in which Gfp expression is driven by Nrl promoter. Gfp+ photoreceptors were purified by FACS and RNA was extracted and amplified by Nugene kit.
description.1 description.2 data_processing
GSM92610 This is Nrl-ko-Gfp 4 weeks sample. RMA
GSM92611 This is Nrl-ko-Gfp 4 weeks replicate 2 RMA
GSM92612 This is Nrl-ko-Gfp 4 weeks replicate 3 RMA
GSM92613 This is Nrl-ko-Gfp 4 weeks replicate 4. RMA
GSM92614 This is Nrl-ko-Gfp E16 replicate 1. RMA
GSM92615 This is Nrl-ko-Gfp E16 replicate 2. RMA
platform_id contact_name contact_email contact_phone
GSM92610 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
GSM92611 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
GSM92612 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
GSM92613 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
GSM92614 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
GSM92615 GPL1261 Swaroop,,Anand swaroop@umich.edu 734-615 2246
contact_department contact_institute contact_address
GSM92610 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
GSM92611 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
GSM92612 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
GSM92613 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
GSM92614 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
GSM92615 Ophthalmology & Visual Sciences University of Michigan 1000 Wall St.
contact_city contact_state contact_zip/postal_code contact_country
GSM92610 Ann Arbor MI 48105 USA
GSM92611 Ann Arbor MI 48105 USA
GSM92612 Ann Arbor MI 48105 USA
GSM92613 Ann Arbor MI 48105 USA
GSM92614 Ann Arbor MI 48105 USA
GSM92615 Ann Arbor MI 48105 USA
contact_web_link
GSM92610 http://www.umich.edu/~retina
GSM92611 http://www.umich.edu/~retina
GSM92612 http://www.umich.edu/~retina
GSM92613 http://www.umich.edu/~retina
GSM92614 http://www.umich.edu/~retina
GSM92615 http://www.umich.edu/~retina
supplementary_file
GSM92610 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92610/suppl/GSM92610.CEL.gz
GSM92611 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92611/suppl/GSM92611.CEL.gz
GSM92612 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92612/suppl/GSM92612.CEL.gz
GSM92613 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92613/suppl/GSM92613.CEL.gz
GSM92614 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92614/suppl/GSM92614.CEL.gz
GSM92615 ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM92nnn/GSM92615/suppl/GSM92615.CEL.gz
data_row_count relation sample_id dev_stage genotype
GSM92610 45101 Reanalyzed by: GSE119085 GSM92610 P28 NrlKO
GSM92611 45101 Reanalyzed by: GSE119085 GSM92611 P28 NrlKO
GSM92612 45101 Reanalyzed by: GSE119085 GSM92612 P28 NrlKO
GSM92613 45101 Reanalyzed by: GSE119085 GSM92613 P28 NrlKO
GSM92614 45101 Reanalyzed by: GSE119085 GSM92614 E16 NrlKO
GSM92615 45101 Reanalyzed by: GSE119085 GSM92615 E16 NrlKO
age
GSM92610 28
GSM92611 28
GSM92612 28
GSM92613 28
GSM92614 -4
GSM92615 -4
Formulating lmFit input: Design Matrix
Bioconductor way:
desMat <-model.matrix(~ genotype*age, data =pData(eset))
Equivalently, if using the separate way:
desMat <-model.matrix(~ genotype*age, data = myMeta)
topTable(ebFit, coef = 2) is equivalent here, but much less informative!!
What is the null hypothesis here?
Plot the top 6 probes for genotypeNrlKO
Code
# grab the gene names of the top 6 genes from topTablekeep <-topTable(ebFit, coef ="genotypeNrlKO", number =6) %>%rownames()# extract the data for all 6 genes in tidy formattopSixGenotype <-toLongerMeta(eset) %>%filter(gene %in% keep) %>%mutate(gene =factor(gene, levels = keep))ggplot(topSixGenotype, aes(x = age, y = expression, color = genotype)) +geom_point() +xlab("age (days)") +facet_wrap( ~ gene) +stat_smooth(method="lm", se=FALSE, cex=0.5)
topTable(ebFit, coef = 3) is equivalent here, but much less informative!!
What is the null hypothesis here?
Plot the top 6 probes for age
Code
# grab the gene names of the top 6 genes from topTablekeep <-topTable(ebFit, coef ="age", number =6) %>%rownames()# extract the data for all 6 genes in tidy formattopSixAge <-toLongerMeta(eset) %>%filter(gene %in% keep) %>%mutate(gene =factor(gene, levels = keep))ggplot(topSixAge, aes( x = age, y = expression, color = genotype)) +geom_point() +xlab("age (days)") +facet_wrap( ~ gene) +stat_smooth(method="lm", se=FALSE, cex=0.5)
topTable(ebFit, coef = 4) is equivalent here, but much less informative!!
What is the null hypothesis here?
Plot the top 6 probes for genotypeNrlKO:age
Code
# grab the gene names of the top 6 genes from topTablekeep <-topTable(ebFit, coef ="genotypeNrlKO:age", number =6) %>%rownames()# extract the data for all 6 genes in tidy formattopSixItx <-toLongerMeta(eset) %>%filter(gene %in% keep) %>%mutate(gene =factor(gene, levels = keep))ggplot(topSixItx, aes( x = age, y = expression, color = genotype)) +geom_point() +xlab("age (days)") +facet_wrap( ~ gene) +stat_smooth(method="lm", se=FALSE, cex=0.5)
topTable(ebFit, coef = c(2,4)) is equivalent here, but much less informative!!
What is the null hypothesis here?
Plot the top 6 probes for any effect of genotype
Code
# grab the gene names of the top 6 genes from topTablekeep <-topTable(ebFit, coef =c("genotypeNrlKO", "genotypeNrlKO:age"), number =6) %>%rownames()# extract the data for all 6 genes in tidy formattopSixGenotypeMarginal <-toLongerMeta(eset) %>%filter(gene %in% keep) %>%mutate(gene =factor(gene, levels = keep))ggplot(topSixGenotypeMarginal, aes( x = age, y = expression, color = genotype)) +geom_point() +xlab("age (days)") +facet_wrap( ~ gene) +stat_smooth(method="lm", se=FALSE, cex=0.5)
Comparison of \(s_g^2\) and \(\tilde{s}_g^2\) (shrinkage!)
Fill in the blank (increases or decreases):
For large variances, limma ___________ the gene-specific variance estimates.
For small variances, limma ___________ the gene-specific variance estimates.
FDR estimates can be invalid (assumptions are violated)
Possible solution: nonparametric test
Limitation: lack of flexibility to adjust for multiple covariates
Another possible solution: estimate sampling distribution or p-values “empirically” using resampling techniques
Permutation: construct a simulated version of your dataset that satisfies the null hypothesis and compute statistic (e.g. shuffle group labels for a two-group comparison); repeat many times and use permutation statistics as your sampling distribution rather than a t, Normal, F, \(\chi^2\), etc
Limitation: often computationally intensive for genomics; not optimal for small sample sizes