genotypeConcordance irregular behavior in counting after applying filters
Closed this issue · 3 comments
This issue tracker is for bug reports only. Before opening a new issue here, please make a post on our support forum so that our support team can look at your issue and determine whether it needs to be escalated into a bug report.
Instructions
- Use a concise yet descriptive title;
- Determine whether your issue is a bug report, a feature request, or a documentation request;
- Choose the corresponding template block below and fill it in, replacing or deleting text in italics (surrounded by
_
) as appropriate; - Delete the other template blocks and this header.
Bug Report
Affected tool(s)
GenotypeConcrodance outputs
Affected version(s)
- Latest public release version [GTAK 4.4.0.0 wrapper]
Description
Describe the problem below. Provide screenshots , stacktrace , logs where appropriate.
vcf_snippet.txt
DP7.genotype_concordance_contingency_metrics.txt
DP7.genotype_concordance_detail_metrics.txt
DP7.genotype_concordance_summary_metrics.txt
Steps to reproduce
step 1. whole genome amplified sample is validated against the same sample with native DNA
step 2. two samples with variants are indepdently called-- whereby missing sites are removed
step 3. two samples are filtered by gatk's variantfiltration with filter expression DP < 7.0
step 4. concordance is calculated by genotypeconcordance in picard wrapper in GATK..
Expected behavior
I expect genotype concordance is the same as non-ref genotype concordance because there should be zero case of REF-REF matching because each indepdent file does not contain REF sites
Actual behavior
The results show 78% genotype concordance and 22% non-ref concordance in the summary metric output. I am not able to recreate these values from the contingency count details given in the detail metrics
Documentation request
Tool(s) involved
GenotypeConcordance Outputs
Description
Request a better description matching between Detail Output and Summary Output-- especially how genotypes and non-ref genotypes are calculated from the Detail output
@rickymagner Is there any chance you could look at this ?
Hi, I'd like to take a look and help you with this. In order to do that, can you post the full command line prompt that you used to run GenotypeConcordance
? If you have a snippet of the call and truth VCFs that would be helpful. For example, the snippet you posted has a sample named "call" and "truth", and it looks like there are missing/ref GT calls in it that I imagine would confound the stats if you used that VCF in your command.
Please let us know if you still have further issues and would like to reopen this. For now we'll close due to inactivity.