***The Jar package of Accelerated GATK4 HaplotypeCaller is moved to the Release repository, because of the file size limit of Code repository. ***
create effective regions
Chop the whole genome region into pieces so as to facilitate the parallelized calculation. Commandebed_creater
will create a bed file with the suffixeffective.bed
in the same directory of the reference fasta file. This step only needs to be run once and its result can be reused in theAccelerated HaplotypeCaller
step.
ebed_creater reference.fa
Run Accelerated GATK4 HaplotypeCaller
The usage ofAccelerated HaplotypeCaller
is almost the same as the original GATK4 HaplotypeCaller, except for the new parameter--num-threads
, which can be used to set the number of threads for the tool. This paramter is optiional. If it is not set, all available threads will be used.
// VCF
java -jar acc_hc4_v2.0.4.jar HaplotypeCaller -R ucsc.hg19.fasta -I
NA12878_dedup_merge.bam -O test_gvcf.vcf
// GVCF
java -jar acc_hc4_v2.0.4.jar HaplotypeCaller -R ucsc.hg19.fasta -I
NA12878_dedup_merge.bam -O test_gvcf.g.vcf --emit-ref-confidence GVCF
// FPGA
java -jar acc_hc4_v2.0.4.jar HaplotypeCaller -R ucsc.hg19.fasta -I
NA12878_dedup_merge.bam -O test_gvcf.vcf --use-fpga
- If downsampling is disabled, which can be set by
--max-reads-per-alignment-start 0
, the VCF result of this tool is eactly the same as the original GATK4 HaplotypeCaller. - if downsampling is not disabled, there will be on average of 10 out of millions variants on our WGS test data.
- This tool needs java 1.8 to run properly.