v0.3.9 (Docker: https://hub.docker.com/r/pkrusche/hap.py/)
Compares a query VCF to a truth VCF to calculate performance metrics including sensitivity and precision using hap.py and vcfeval. It is equivalent to running the precisionFDA GA4GH benchmarking app in 'vcfeval-partialcredit' mode with other options left as default. More information available at the following links:
- https://precision.fda.gov/apps/app-F5YXbp80PBYFP059656gYxXQ
- https://github.com/ga4gh/benchmarking-tools/tree/master/doc/ref-impl
Validating an NGS workflow using the NA12878 (NIST Genome in a Bottle) benchmarking sample.
Input files:
- A query VCF (.vcf | .vcf.gz) - output from the workflow being validated
- A truth VCF (.vcf | .vcf.gz)
- A panel BED file (.bed) - region covered in query vcf
- A high confidence region BED file (.bed) - high confidence region for truth set
Parameters:
- Skip - default = false. If set to true will exit without performing any analysis
- Output files prefix (required)
- Output folder (optional)
- Indication if additional stratification for NA12878 samples should be performed (default = False)
- If truth set is NA12878, additional stratification of results can be performed and output in extended.csv file
- HOWEVER the instance type will need to be upgraded to have at least 7GB of RAM, and the app will take significantly longer to run
- Reference Genome build GRCh37 (default) or GRCh38
Note:
- The BED file names must not contain spaces or characters such as + and -
This app outputs:
- Summary csv file containing separate performance metrics for SNPs and Indels
- Summary report HTML (generated using ga4gh rep.py https://github.com/ga4gh/benchmarking-tools/tree/master/reporting/basic)
- Detailed results folder containing:
- Extended csv file - Including results stratification and confidence intervals
- VCF file - annotated vcf showing TP, FP and FN variants
- runinfo JSON - detailed information about hap.py run
- version log - version numbers of software used in app
- metrics JSON - JSON file containing all computed metrics and tables
- 'chr' is stripped from the chromosome field of the VCF and BED files (if hg19 format used)
- Indexed and zipped VCF files passed to hap.py:
- Uses vcfeval comparison engine
- If the sample is NA12878, additional stratification is performed using bed files found here: https://github.com/ga4gh/benchmarking-tools/tree/master/resources/stratification-bed-files
- Summary HTML is generated
- Only works with inputs mapped to GRCh37 or GRCh38