A common end goal of an RNA-Seq experiment is to identify what genes have responded to a treatment. For example, has a newly developed drug increased the expression of a target or has a knockdown worked to decrease expression? In order to answer these questions we perform a type of analysis called differential expression (DE) analysis.
A DE analysis is a statistical procedure that identifies differentially up or downregulated genes between two or more conditions or samples. It involves comparing the expression levels of each gene in one group of samples (e.g., disease samples) to the expression levels in another (e.g., healthy samples) to identify genes that have changed across conditions. DE analysis can have a significant impact by identifying disease-associated genes (which can be used as potential drug development targets), and identify biomarkers that can be used for diagnosis, prognosis, or monitoring of disease progression.
DE analysis is typically performed using specialized software to maximize our ability to identify differences when the number of replicates by the condition is small (e.g., 2 or 3 technical repetitions) while accounting for differences in library size and false discovery rate due to multiple tests conducted all at once. At Eclipsebio, we use a powerful tool called DESeq2 to identify differentially expressed genes. One way that we use DESeq2 is with our eRibo service, where we can detect changes in ribosome-associated and total transcriptome counts between different conditions.
DESeq2 uses information across all genes in the experiment to produce a robust estimate of the variability (dispersion) between samples for each gene in a way that considers the logarithmic nature of read count data. It then uses these dispersions to divide the log2 fold changes between conditions and calculate a statistical test called the “Wald” test. This test helps us determine whether our observed differences are likely real or just due to chance, and provides robust lists of DE genes to support answering specific scientific questions.
The same framework that is used to identify differentially expressed genes can also be applied different data modalities. For example, a similar analysis can be performed with eCLIP peaks to determine if a region has differential enrichment following a treatment. In the case of eCLIP, to account for the presence of an input we compare the ratio of fold changes rather than the observed counts in the immunoprecipitated libraries alone.
Creating the right framework for an accurate differential analysis can take a lot of effort. At Eclipsebio we have experts with extensive experience in statistical methods to take out the guesswork of identifying differential genes or peaks. Contact us today to see how we can help you examine differentials in your experiment.