/Selma

Germline Variant Calling Pipeline built in Snakemake

Primary LanguageShell


Selma

Travis Build Status

About Selma

Selma is a whole genome (germline) variant calling workflow developed at the University of Bergen based on the GATK suite of tools. The guiding philosophy behind it is that it should be easy to setup, easy to use and that it utilizes system resources efficiently. This is achieved by adopting a user centric frame of mind that aims to simplify complex tasks without sacrificing functionality. The workflow itself is based on Snakemake and all dependencies are handled by using Docker and Singularity container technology. The current intended platform is TSD but support for HUNT-cloud as well as local execution is planned for future releases.
Selma is named after the mythical Norwegian sea serpent that supposedly lives in Lake Seljord

The workflow development is currently supported by Elixir2, NorSeq and Tryggve2, and in the past also by BioBank Norway.

Graphical visualization of the workflow steps

Graphical visualization of the workflow steps

This is a simplified graph portraying the key steps that the workflow goes through, this is a complete overview including every single step. The steps that have been left out only perform "administrative" functions and don't add to the data analysis per se.

Documentation

Tools

bwa version 0.7.15-2+deb9u1 - Maps fastq file to reference genome
samtools version 1.3.1-3 - bwa pipes its output to samtools to make a bam output file
The following tools are all gatk version 4.1.2.0
SplitIntervals - Splits interval list for scatter gather parallelization
FastqToSam - Converts fastq files to unmapped bam files
MergeBamAlignment - Merge aligned BAM file from bwa with the unmapped BAM file from FastqToSam
MarkDuplicates - Identifies duplicate reads
BaseRecalibrator - Generates recalibration table for Base Quality Score Recalibration
GatherBQSRReports - Gather base recalibration files from BaseRecalibrator
ApplyBQSR - Apply base recalibration from BaseRecalibrator
GatherBamFiles - Concatenate efficiently BAM files from ApplyBQSR
HaplotypeCaller - Call germline SNPs and indels via local re-assembly of haplotypes
GenotypeGVCFs - Perform genotyping on one pre-called sample from HaplotypeCaller
VariantRecalibrator - Build a recalibration model to score variant quality for filtering purposes
ApplyVQSR - Apply a score cutoff to filter variants based on a recalibration table

Credits

Supervisor
Kjell Petersen

Main developer
Oskar Vidarsson