Conversion not returning all variant entries
Closed this issue · 4 comments
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest version
- I checked the documentation and found no answer
- I checked to make sure that this issue has not already been filed
Context
Hello, I have been using vcf2fhir on my test VCFs, and I have noticed that no INDEL type variants are included in the JSON product following conversion. There are quite a few entries that are missing beyond INDELs as well, and I am not sure why. Is there a way to set it that all of the variant entries convert, regardless of type? I don't believe any entry is missing information.
Expected Behavior
I am expecting all of the variant entries to be included in the final JSON output.
Current Behavior
Only a subset of the variants are included in the JSON output following the conversion.
Steps to Reproduce
The code I am using for the conversion is as follows:
import vcf2fhir
vcf_fhir_converter = vcf2fhir.Converter('/test_1000.vcf', ref_build='GRCh37', genomic_source_class='mixed', patient_id='patient_ID')
vcf_fhir_converter.convert(output_filename='/test_1000.json')
Failure Logs
I am unable to attach VCF or JSON files, but I would be more than happy to send them via email if you'd like to see them.
Hi @clake-deloitte , generally this is because the VCF rows in question are meeting some exclusion criterion (described here). A nice way to see why a given row isn't converting is to enable and check the invalid record log (described here). Can you give that a try?
Hi, yes this is actually very helpful. However, I'd like to have all the variants kept - and it seems that all of the variant entries have the same error (which is why they're being dropped):
2023-05-02 16:21:28,091 - vcf2fhir.invalidrecord - DEBUG - Reason: VCF FORMAT.GT is in ['0/0','0|0','0'], Record: Record(CHROM=1, POS=55416, REF=G, ALT=[A]), considered sample: CallData(GT=0|0, DS=0.05, GL=[-0.48, -0.48, -0.48])
Is there any way to forego this dropping of these variants at all, short of modifying the VCFs directly? Thanks!
Hi @clake-deloitte, unfortunately there is no way to forego the dropping of the variants without a code change (or changing the VCFs as you mention). If you do edit the code, there might be other consequences that need to be tested - for instance, calculating allelic state relies on genotype.
Ok, no problem, thank you for the information!