The DiffSegR
R package takes the bam files from an RNA-Seq dataset
with two biological conditions and returns transcriptome-wide expression
differences between these two conditions without using pre-existing
annotations (typically genes).
All the motivations for this method and a comparison with the state-of-the-art are described in Liehrmann et al. (2023). Briefly, gaining a global picture of gene and RNA regulations requires detailed knowledge of the transcriptome along with the enzymatic and RNA-binding activities that shape it. Hence, several RNA-Seq-based strategies have recently been developed to decipher its complexity. However, most of the tools developed only count the abundance of sequencing reads along annotated regions. These annotations are typically incomplete and often lead to errors in the differential expression analysis.
In the context of differential expression analysis, the regulations of
genes and RNAs result in local changes of the log2 fold change (log2-FC)
along the genome, i.e the differential transcription profile. Therefore,
one can use a multiple changepoint detection algorithm such as fpop
(Maidstone et al. 2017) to delineate the boundaries of the
differentially expressed regions (DERs) without relying on pre-existing
annotations. The DERs can then be assessed using, for example, the
negative binomial generalized linear model of DESeq2
(Love et
al. 2014).
The DiffSegR
R package, an identify then annotate tool, implements
this strategy.
Make sure that remotes
is installed by running
install.packages("remotes")
, then type
remotes::install_github("aLiehrmann/DiffSegR")
See the introductory
tutorial
for an introduction to our framework on real data. See the advanced
tutorial
for examples of more advanced uses of DiffSegR
. If you’re interested
in analyzing multiple loci, including entire chromosomes, after reading
the introduction, go take a look at our multi-loci
analysis
vignette.
Liehrmann, A., Delannoy, E., Castandet, B. and Rigaill, G. DiffSegR: an RNA-Seq data driven method for differential expression analysis using changepoint detection (2023) doi:10.1093/nargab/lqad098.
Maidstone R, Hocking T, Rigaill G, Fearnhead P. On optimal multiple changepoint algorithms for large data. Stat Comput 27(2), 519-533 (2017) doi:10.1007/s11222-016-9636-3.
Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014) doi:10.1186/s13059-014-0550-8.