XueyiDong/LongReadBenchmark

Benchmarking long-read RNA-seq analysis tools

RMIT

Benchmarking long-read RNA-sequencing analysis tools using in in silico mixtures

This repository contains the code used to perform the analysis and generate the figures in this paper:

Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures Xueyi Dong, Mei R. M. Du, Quentin Gouil, Luyi Tian, Jafar S. Jabbari, Rory Bowden, Pedro L. Baldoni, Yunshun Chen, Gordon K. Smyth, Shanika L. Amarasinghe, Charity W. Law, Matthew E. Ritchie Nature Methods 2023; doi: https://doi.org/10.1038/s41592-023-02026-3

Data Availability

Our RNA-seq data are available from Gene Expression Omnibus (GEO) under accession number GSE172421 (main benchmarking dataset) and GSE227000 (lab-based mixture of replicate 1).

Please cite our paper if you use our data and/or scripts in your studies.

Index

Pilot in silico mixture experiment

All scripts are available at pilot (Illumina) and pilot_ONT/scripts (ONT)

Generate in silico mixture samples and preprocess

ONT: ONT/mix_prepare

Illumina: illumina/mix_prepare

Generate downsampled samples and preprocess

Scripts are available at downsample

Mapping and quantification

ONT: ONT/preprocess

Illumina: illumina/salmon_map.sh

Quality control

ONT-specific: ONT/QC

General: longvsshort/overdisp.R, longvsshort/qc.R and longvsshort/sequinCPMvsAbundance.R

Isoform detection

Scripts to run softwares: ONT/isoform_detection/methods

Analysis of results: ONT/isoform_detection/analysis

Differential transcript expression

ONT: ONT/DE_mix.Rmd

Illumina: illumina/DE_mix.Rmd

Results comparison: longvsshort/DEmixres.R

Differential transcript usage

ONT: ONT/DTU_mix.Rmd

Illumina: illumina/DTU_mix.Rmd

Results comparison: longvsshort/DTUmixres.R and longvsshort/DTUmixrestx.R