rmappet is a nextflow pipeline for parallel alternative splicing analysis of bulk, short-read RNA sequencing data using both rMATS and Whippet. Splicing events reported by each tool are then overlapped by location to identify shared events, providing additional confidence when interpreting the results. Both single- and paired-end data is supported.
- Raw read quality control and trimming (Fastp)
- rMATS - Alternative splicing analysis
- Whippet - Alternative splicing analysis
- Overlap splicing coordinates
-
Install Nextflow (>=22.10.3)
-
Install Docker for local execution
-
Install Singularity for cluster execution
-
Download and test rmappet in stub mode:
nextflow run didrikolofsson/rmappet -profile test,docker -stub
The pipeline can currently be executed locally using docker or on distributed computing clusters using SLURM and singularity. Software dependencies are resolved using pre-built docker and singularity images, removing the need for users to manage their own dependenices. This section provides information about the required pipeline inputs and how to execute the pipeline in the supported environments.
A parameter file with necessary settings and file paths must be supplied when executing the pipeline. The parameter file should be in YAML format and contain the following information:
- dev - Run the pipeline in development mode using a single sample for testing
- samplesheet - Path to sample sheet in csv format
- genome - Path to genome fasta
- annotation - Path to genome annotation in GTF format
- outputdir - Path to output directory
- readlen - Read length
- libtype - Library type
Example parameter files for both single and paired end experiments can be found in the /data
folder.
A sample sheet with information about the experimental design should be included together with the parameter file. The sample sheet should be in CSV format and contain the following columns and information:
sample_id | read1 | read2 | condition |
---|---|---|---|
sample1 | path/to/sample1_1.fastq.gz | path/to/sample1_2.fastq.gz | condition_a |
sample2 | path/to/sample2_1.fastq.gz | path/to/sample2_2.fastq.gz | condition_a |
sample3 | path/to/sample3_1.fastq.gz | path/to/sample3_2.fastq.gz | condition_a |
sample4 | path/to/sample4_1.fastq.gz | path/to/sample4_2.fastq.gz | condition_b |
sample5 | path/to/sample5_1.fastq.gz | path/to/sample5_2.fastq.gz | condition_b |
sample6 | path/to/sample6_1.fastq.gz | path/to/sample6_2.fastq.gz | condition_b |
Examples of sample sheets can be found in the /data
folder.
Execute the pipeline on a local computer using docker by running the following command. Make sure that the docker daemon is running before launch to avoid errors.
nextflow run didrikolofsson/rmappet -profile docker -params-file path/to/params.yaml
Execute the pipeline on a distributed computing cluster by running the following command. Make sure that the singularity command is accessible on the head node before launch to avoid errors, e.g call module load singularity
on clusters with a module system.
nextflow run didrikolofsson/rmappet -profile slurm,singularity -params-file path/to/params.yaml
The rmappet pipeline generates a set of output folders and files containing results from the various processing steps. The pipelines output is structured as follows:
outputdir
├── fastp
│ ├── sample1.fastp.html
│ └── sample1.fastp.json
├── overlap
│ ├── condition_a_vs_condition_b.rmats_only.csv
│ ├── condition_a_vs_condition_b.rw_overlap.csv
│ ├── condition_a_vs_condition_b.whippet_only.csv
│ ├── condition_a_vs_condition_b.significant.rmats_only.csv
│ ├── condition_a_vs_condition_b.significant.rw_overlap.csv
│ └── condition_a_vs_condition_b.significant.whippet_only.csv
├── rmats
│ ├── results
│ │ ├── condition_a_vs_condition_b.jc.tsv
│ │ ├── condition_a_vs_condition_b.jcec.tsv
│ │ ├── condition_a_vs_condition_b.significant.jc.tsv
│ │ └── condition_a_vs_condition_b.significant.jcec.tsv
│ └── run
│ └── condition_a_vs_condition_b
│ └── condition_a_vs_condition_b.txt
├── samtools
│ └── sort
│ ├── sample1.sortedByCoord.bam
│ └── sample1.sortedByCoord.bam.bai
├── star
│ └── alignments
│ ├── sample1.Log.final.out
│ ├── sample1.ReadsPerGene.out.tab
│ └── sample1.SJ.out.tab
└── whippet
├── delta
│ ├── condition_a_vs_condition_b.diff.gz
│ └── condition_a_vs_condition_b.significant.tsv
└── quant
├── sample1.gene.tpm.gz
├── sample1.isoform.tpm.gz
├── sample1.jnc.gz
├── sample1.map.gz
└── sample1.psi.gz
Please note that rmappet is currently under active development, and we are still working to fix bugs and add features. If you have any questions, suggestions, or issues, please feel free to contact us or open an issue.
Didrik Olofsson (didrik.olofsson@omiqa.bio)
Dr. Alexander Neumann (alexander.neumann@omiqa.bio)
Prof. Dr. Florian Heyd (florian.heyd@fu-berlin.de)