nf-core/rnavar is a bioinformatics pipeline for RNA variant calling analysis following GATK4 best practices.
- Merge re-sequenced FastQ files (
cat
) - Read QC (
FastQC
) - Align reads to reference genome (
STAR
) - Sort and index alignments (
SAMtools
) - Duplicate read marking (
GATK4 MarkDuplicates
) - Splits reads that contain Ns in their cigar string (
GATK4 SplitNCigarReads
) - Estimate and correct systematic bias using base quality score recalibration (
GATK4 BaseRecalibrator
,GATK4 ApplyBQSR
) - Convert a BED file to a Picard Interval List (
GATK4 BedToIntervalList
) - Scatter one interval-list into many interval-files (
GATK4 IntervalListTools
) - Call SNPs and indels (
GATK4 HaplotypeCaller
) - Merge multiple VCF files into one VCF (
GATK4 MergeVCFs
) - Index the VCF (
Tabix
) - Filter variant calls based on certain criteria (
GATK4 VariantFiltration
) - Annotate variants (
snpEff
, Ensembl VEP) - Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC
,R
)
Tool | Version |
---|---|
FastQC | 0.11.9 |
STAR | 2.7.9a |
Samtools | 1.15.1 |
GATK | 4.2.6.1 |
Tabix | 1.11 |
SnpEff | 5.0 |
Ensembl VEP | 104.3 |
MultiQC | 1.12 |
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/rnavar -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv --outdir <OUTDIR> --genome GRCh38
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters;
see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
nf-core/rnavar was originally written in Nextflow DSL2 for use at the Barntumörbanken, Karolinska Institutet, by Praveen Raj (@praveenraj2018) and Maxime U Garcia (@maxulysse).
nf-core/rnavar was originally written by Praveen Raj at The Swedish Childhood Tumor Biobank (Barntumörbanken). Maxime U Garcia at The Swedish Childhood Tumor Biobank (Barntumörbanken) helped with development.
Maintenance is now lead by Maxime U Garcia (now at Seqera Labs)
Main developers:
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #rnavar
channel (you can join with this invite).
If you use nf-core/rnavar for your analysis, please cite it using the following doi: 10.5281/zenodo.6669636
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.