/gatk4-HaplotypeCaller-nf

GATK4 HaplotypeCaller step, in gVCF mode, first step for subsequent whole cohort Joint Genotyping.

Primary LanguageNextflowGNU General Public License v3.0GPL-3.0

gatk4-HaplotypeCaller-nf

GATK4 HaplotypeCaller step, in gVCF mode, first step for subsequent whole cohort Joint Genotyping, following in GATK Best Practices (step Call Variants Per-Sample).

Description

Small pipeline to call recalibrated BAM, on a per sample basis, and store the gVCF. This pipeline will take advantage of a scatter-gather strategy. A subsequent pipeline will perform the full cohort calling with all the gVCF files.

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.
  2. GATK4 executables
  3. Picard Tools

Input

  • --input : your intput BAM file(s) (do not forget the quotes for multiple BAM files e.g. --input "test_*.bam")
  • --output_dir : the folder that will contain your test_123.gVCF file or your test_001.gVCF, test_002.gVCF, ... files.
  • --ref_fasta : your reference in FASTA. Of course, be sure it is compatible (or the same) with the one that aligned your BAM file(s).
  • --gatk_exec : the full path to your GATK4 binary file.
  • --picard_dir : directory that contains picard.jar
  • --interval_list : a file for the intervals to call on. More information on interval_list format.

A nextflow.config is also included, modify for suitability outside our pre-configured clusters (see Nexflow configuration).

Usage for Cobalt cluster

nextflow run iarcbioinfo/gatk4-HaplotypeCaller.nf -profile cobalt --input "/data/test_*.bam" --output_dir myGVCFs --ref_fasta /ref/Homo_sapiens_assembly38.fasta --gatk_exec /bin/gatk-4.0.4.0/gatk --interval_list target.list