Snakemake pipeline for variant calling from raw sample sequences, with lots of bells and whistles.
Advantages:
- One command to run the whole pipeline!
- Many tools to choose from for each step
- Simple configuration via a single file
- Automatic download of tool dependencies
- Resuming from failing jobs
See --> the Wiki pages <-- for setup and documentation.
For bug reports and feature requests, please open an issue on our GitHub page.
Minimal input:
- Reference genome
fasta
file - Per-sample
fastq
files - Optionally, a
vcf
file of known variants to restrict the variant calling process
Process and available tools:
- Read trimming (single or paired end)
- Read mapping
- Optional read filtering, clipping, duplication removal, and quality score recalibration
- Damage profiling (optional; e.g., for ancient DNA)
- Variant calling and genotyping
- Variant filtering
- Frequency calling (for pool sequencing data, as an alternative to variant calling)
- Quality control, statistics, SNP annotation, reporting
- FastQC
- samtool stats
- samtool flagstat
- QualiMap
- Picard CollectMultipleMetrics
- bcftools stats
- snpEff
- VEP (Ensembl Variant Effect Predictor)
- MultiQC
Typical output:
- Variant calls
vcf
, raw and filtered, and potentially with annotations - MultiQC report (includes summaries of most other tools, and of the final
vcf
) - Snakemake report (optional)
Intermediate output files such as bam
files are also kept by default,
and mpileup
files can optionally be created if needed.
In addition to the above tools, there are some tools used as glue between the steps.
If you are interested in the details, have a look at the snakemake rules for each step.
When using grenepipe, please cite:
grenepipe: A flexible, scalable, and reproducible pipeline
to automate variant calling from sequence reads.
Lucas Czech and Moises Exposito-Alonso. Bioinformatics. 2022.
doi:10.1093/bioinformatics/btac600 [pdf]
Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See our Wiki for their references.