rmappet

Introduction

rmappet is a nextflow pipeline for parallel alternative splicing analysis of bulk, short-read RNA sequencing data using both rMATS and Whippet. Splicing events reported by each tool are then overlapped by location to identify shared events, providing additional confidence when interpreting the results. Both single- and paired-end data is supported.

Pipeline summary

Raw read quality control and trimming (Fastp)
rMATS - Alternative splicing analysis
1. Build STAR genome index (STAR)
2. Align trimmed reads (STAR)
3. Sort and index alignments (Samtools)
4. Alternative splicing analysis using rMATS (rMATS)
5. Standardize rMATS results
Whippet - Alternative splicing analysis
1. Build whippet genome index (Whippet)
2. Quantify splicing events (Whippet)
3. Alternative splicing analysis using Whippet (Whippet)
Overlap splicing coordinates

Get started

Install Nextflow (>=22.10.3)
Install Docker for local execution
Install Singularity for cluster execution

Download and test rmappet in stub mode:

nextflow run didrikolofsson/rmappet -profile test,docker -stub

Pipeline execution

The pipeline can currently be executed locally using docker or on distributed computing clusters using SLURM and singularity. Software dependencies are resolved using pre-built docker and singularity images, removing the need for users to manage their own dependenices. This section provides information about the required pipeline inputs and how to execute the pipeline in the supported environments.

Required inputs

Parameter file

A parameter file with necessary settings and file paths must be supplied when executing the pipeline. The parameter file should be in YAML format and contain the following information:

dev - Run the pipeline in development mode using a single sample for testing
samplesheet - Path to sample sheet in csv format
genome - Path to genome fasta
annotation - Path to genome annotation in GTF format
outputdir - Path to output directory
readlen - Read length
libtype - Library type

Example parameter files for both single and paired end experiments can be found in the /data folder.

Sample sheet

A sample sheet with information about the experimental design should be included together with the parameter file. The sample sheet should be in CSV format and contain the following columns and information:

sample_id	read1	read2	condition
sample1	path/to/sample1_1.fastq.gz	path/to/sample1_2.fastq.gz	condition_a
sample2	path/to/sample2_1.fastq.gz	path/to/sample2_2.fastq.gz	condition_a
sample3	path/to/sample3_1.fastq.gz	path/to/sample3_2.fastq.gz	condition_a
sample4	path/to/sample4_1.fastq.gz	path/to/sample4_2.fastq.gz	condition_b
sample5	path/to/sample5_1.fastq.gz	path/to/sample5_2.fastq.gz	condition_b
sample6	path/to/sample6_1.fastq.gz	path/to/sample6_2.fastq.gz	condition_b

Examples of sample sheets can be found in the /data folder.

Local execution

Execute the pipeline on a local computer using docker by running the following command. Make sure that the docker daemon is running before launch to avoid errors.

nextflow run didrikolofsson/rmappet -profile docker -params-file path/to/params.yaml

Cluster execution

Execute the pipeline on a distributed computing cluster by running the following command. Make sure that the singularity command is accessible on the head node before launch to avoid errors, e.g call module load singularity on clusters with a module system.

nextflow run didrikolofsson/rmappet -profile slurm,singularity -params-file path/to/params.yaml

Pipeline output

The rmappet pipeline generates a set of output folders and files containing results from the various processing steps. The pipelines output is structured as follows:

outputdir
├── fastp
│   ├── sample1.fastp.html
│   └── sample1.fastp.json
├── overlap
│   ├── condition_a_vs_condition_b.rmats_only.csv
│   ├── condition_a_vs_condition_b.rw_overlap.csv
│   ├── condition_a_vs_condition_b.whippet_only.csv
│   ├── condition_a_vs_condition_b.significant.rmats_only.csv
│   ├── condition_a_vs_condition_b.significant.rw_overlap.csv
│   └── condition_a_vs_condition_b.significant.whippet_only.csv
├── rmats
│   ├── results
│   │   ├── condition_a_vs_condition_b.jc.tsv
│   │   ├── condition_a_vs_condition_b.jcec.tsv
│   │   ├── condition_a_vs_condition_b.significant.jc.tsv
│   │   └── condition_a_vs_condition_b.significant.jcec.tsv
│   └── run
│       └── condition_a_vs_condition_b
│           └── condition_a_vs_condition_b.txt
├── samtools
│   └── sort
│       ├── sample1.sortedByCoord.bam
│       └── sample1.sortedByCoord.bam.bai
├── star
│   └── alignments
│       ├── sample1.Log.final.out
│       ├── sample1.ReadsPerGene.out.tab
│       └── sample1.SJ.out.tab
└── whippet
    ├── delta
    │   ├── condition_a_vs_condition_b.diff.gz
    │   └── condition_a_vs_condition_b.significant.tsv
    └── quant
        ├── sample1.gene.tpm.gz
        ├── sample1.isoform.tpm.gz
        ├── sample1.jnc.gz
        ├── sample1.map.gz
        └── sample1.psi.gz

Troubleshooting

Please note that rmappet is currently under active development, and we are still working to fix bugs and add features. If you have any questions, suggestions, or issues, please feel free to contact us or open an issue.

Contact

Didrik Olofsson (didrik.olofsson@omiqa.bio)

Dr. Alexander Neumann (alexander.neumann@omiqa.bio)

Prof. Dr. Florian Heyd (florian.heyd@fu-berlin.de)

maurya-anand/nf-rmappet