GATK best practices workflow Pipeline summary
SnakeMake workflow for Human Germline short variants (SNP+INDEL)
- Reference genome related files and GTAK budnle files (GATK)
- VEP Variarition annotation files (VEP)
- Adapter trimming (Fastp)
- Aligner (BWA mem2)
- Mark duplicates (samblaster)
- Generates recalibration table for Base Quality Score Recalibration (BaseRecalibrator)
- Apply base quality score recalibration (ApplyBQSR)
- Fastp report (MultiQC)
- Alignment report (MultiQC)
- Call germline SNPs and indels via local re-assembly of haplotypes (HaplotypeCaller)
- Import VCFs to GenomicsDB (GenomicsDBImport)
- Perform joint genotyping on one or more samples pre-called with HaplotypeCaller (GenotypeGVCFs)
- Select a SNP or INDEL of variants from a VCF file (SelectVariants)
- Build a recalibration model to score variant quality for filtering purposes (VariantRecalibrator)
- Apply a score cutoff to filter variants based on a recalibration table (ApplyVQSR)
- Merge all the VCF files (Picard)
Annotate variant calls with VEP (VEP)
├── config
│ ├── captured_regions.bed
│ ├── config.yaml
│ └── samples.tsv
├── dag.svg
├── logs
│ ├── annotate
│ ├── call
│ ├── filter
│ ├── prepare
│ ├── qc
│ ├── ref
│ └── trim
├── raw
│ ├── SRR24443168.fastq.gz
│ └── SRR24443169.fastq.gz
├── README.md
├── report
│ ├── fastp_multiqc_data
│ ├── fastp_multiqc.html
│ ├── prepare_multiqc_data
│ ├── prepare_multiqc.html
│ └── vep_report.html
├── results
│ ├── called
│ ├── filtered
│ ├── prepared
│ ├── trimmed
│ └── vep_annotated.vcf.gz
├── workflow
│ ├── envs
│ ├── report
│ ├── rules
│ ├── schemas
│ ├── scripts
│ └── Snakefile