ref-transcript-mismatch-reporter does not work
Closed this issue · 6 comments
Installation Type
Standalone
pVACtools Version / Docker Image
3.1.1
Python Version
No response
Operating System
No response
Describe the bug
hello,
With ref-transcript-mismatch-reporter (vatools 5.1.0) on my test_vep.vcf as below:
ref-transcript-mismatch-reporter test_vep.vcf --filter hard --output-vcf test.vcf
The error still existed, the variant is as below:
chr12 48238361 . G GCCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC . clustered_events AS_FilterStatus=SITE;AS_SB_TABLE=101,4|1,6;DP=118;ECNT=5;GERMQ=93;MBQ=37,34;MFRL=52,194;MMQ=60,60;MPOS=49;NALOD=1.48;NLOD=8.75;POPAF=6;TLOD=19.3;CSQ=CCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC|stop_gained&protein_altering_variant|HIGH|VDR|7421|Transcript|NM_001364085.1|protein_coding|10/10||NM_001364085.1:c.1451_1452insGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGG|NP_001351014.1:p.Asn484delinsLysAlaGlyArgArgGlySerGlyThrAlaTrpSerAlaProHisTer|1611-1612|1451-1452|484|N/KAGRRGSGTAWSAPH*G|aac/aaGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGGc|||-1||EntrezGene|||rseq_mrna_nonmatch&rseq_5p_mismatch||||OK|||||||||||||||MEAMAASTSLPDPGDFDRNVPRICGVCGDRATGFHFNAMTCEGCKGFFRRSMKRKALFTCPFNGDCRITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMILKRKEEEALKDSLRPKLSEEQQRIIAILLDAHHKTYDPTYSDFCQFRPPVRVNDGGGSHPSRPNSRHTPSFSGDSSSSCSDHCITSSDMMDSSSFSNLDLSEEDSDDPSVTLELSQLSMLPHLADLVSYSIQKVIGFAKMIPGFRDLTSEDQIVLLKSSAIEVIMLRSNESFTMDDMSWTCGNQDYKYRVSDVTKAGHSLELIEPLIKFQVGLKKLNLHEEEHVLLMAICIVSPDRPGVQDAALIEAIQDRLSNTLQTYIRCRHPPPGSHLLYAKMIQKLADLRSLNEEHSKQYRCLSFQPECSMKLTPLVLEVFGNEISLGQPVAVPGWGCSSRATCQARGWRLLSSPPHPVWGSAPPLPPPLSTQPILSPVQPNPFPAGFSPVP GT:AD:AF:DP:F1R2:F2R1:SB 0/0:29,0:0.0318:29:17,0:12,0:29,0,0,0 0/1:76,7:0.0936:83:57,1:19,0:72,4,1,6
It has not been filtered, please help.
Thanks!
How to reproduce this bug
**ref-transcript-mismatch-reporter test_vep.vcf --filter hard --output-vcf test.vcf**
Input files
No response
Log output
ERROR: There was a mismatch between the actual wildtype amino acid sequence (P) and the expected amino acid sequence (N). Did you use the same reference build version for VEP that you used for creating the VCF?
OrderedDict([('chromosome_name', 'chr12'), ('start', '48238361'),
Output files
No response
@xmy1990 thank you for your interest in pVACtools. I'm happy to investigate this issue you're encountering. Can you please attach the problematic variant entry as a VCF file? Because VCF headers vary between files I can't debug this issue without having a proper VCF file. Particularly the VEP CSQ header changes depending on how you ran VEP and I need the particular VEP header matching the VEP CSQ annotation field in this variant. I also need all of the metadata headers for the different FORMAT fields in order to parse the VCF entry correctly.
Thanks a lot @https://github.com/susannasiebert
The problematic variant entry as a VCF file was attached:
Thanks!
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=FAIL,Description="Fail the site if all alleles fail but for different reasons.">
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=orientation,Description="Orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=AS_FilterStatus,Number=1,Type=String,Description="Filter status for each allele, as assessed by ApplyRecalibration. Note that the VCF filter field will reflect the most lenient/sensitive status across all alleles.">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=R,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##SentieonCommandLine.TNfilter=<ID=TNfilter,Version="sentieon-genomics-202112.05",Date="2024-01-31T08:17:32Z",CommandLine="/sga_dev/zb-liaowanjun/sentieon-genomics-202112.05/libexec/driver -r /data2/data_share/pzx/reference/hs37d5/hs37d5.fa --algo TNfilter --tumor_sample T-4032 --normal_sample PB-4032 -v /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointTMP.vcf /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointunfiltered.vcf">
##SentieonCommandLine.TNhaplotyper2=<ID=TNhaplotyper2,Version="sentieon-genomics-202112.05",Date="2024-01-31T06:48:28Z",CommandLine="/sga_dev/zb-liaowanjun/sentieon-genomics-202112.05/libexec/driver -t 15 -r /data2/data_share/pzx/reference/hs37d5/hs37d5.fa -i /data2/dev_projects/xmy/TNB/validation_data/test1/PRJNA298330/T-4032/realigned/T-4032_final.bam -i /data2/dev_projects/xmy/TNB/validation_data/test1/PRJNA298330/PB-4032/realigned/PB-4032_final.bam --interval /sga_dev/panel_validation/V710_panel/bed/sort_KST700_v3_pd100_merged.bed --algo TNhaplotyper2 --call_germline_sites --min_init_tumor_lod 0 --min_tumor_lod 0.5 --prune_factor -1 --min_normal_lod 0 --tumor_sample T-4032 --normal_sample PB-4032 /sga_dev/zb-liaowanjun/sample36_joint/T-4032_PB-4032/matched_tmp/T-4032_jointTMP.vcf">
##contig=<ID=chr1,length=249250621,assembly=b37>
##contig=<ID=chr2,length=243199373,assembly=b37>
##contig=<ID=chr3,length=198022430,assembly=b37>
##contig=<ID=chr4,length=191154276,assembly=b37>
##contig=<ID=chr5,length=180915260,assembly=b37>
##contig=<ID=chr6,length=171115067,assembly=b37>
##contig=<ID=chr7,length=159138663,assembly=b37>
##contig=<ID=chr8,length=146364022,assembly=b37>
##contig=<ID=chr9,length=141213431,assembly=b37>
##contig=<ID=chr10,length=135534747,assembly=b37>
##contig=<ID=chr11,length=135006516,assembly=b37>
##contig=<ID=chr12,length=133851895,assembly=b37>
##contig=<ID=chr13,length=115169878,assembly=b37>
##contig=<ID=chr14,length=107349540,assembly=b37>
##contig=<ID=chr15,length=102531392,assembly=b37>
##contig=<ID=chr16,length=90354753,assembly=b37>
##contig=<ID=chr17,length=81195210,assembly=b37>
##contig=<ID=chr18,length=78077248,assembly=b37>
##contig=<ID=chr19,length=59128983,assembly=b37>
##contig=<ID=chr20,length=63025520,assembly=b37>
##contig=<ID=chr21,length=48129895,assembly=b37>
##contig=<ID=chr22,length=51304566,assembly=b37>
##contig=<ID=chrX,length=155270560,assembly=b37>
##contig=<ID=chrY,length=59373566,assembly=b37>
##contig=<ID=chrM,length=16569,assembly=b37>
##contig=<ID=GL000207.1,length=4262,assembly=b37>
##contig=<ID=GL000226.1,length=15008,assembly=b37>
##contig=<ID=GL000229.1,length=19913,assembly=b37>
##contig=<ID=GL000231.1,length=27386,assembly=b37>
##contig=<ID=GL000210.1,length=27682,assembly=b37>
##contig=<ID=GL000239.1,length=33824,assembly=b37>
##contig=<ID=GL000235.1,length=34474,assembly=b37>
##contig=<ID=GL000201.1,length=36148,assembly=b37>
##contig=<ID=GL000247.1,length=36422,assembly=b37>
##contig=<ID=GL000245.1,length=36651,assembly=b37>
##contig=<ID=GL000197.1,length=37175,assembly=b37>
##contig=<ID=GL000203.1,length=37498,assembly=b37>
##contig=<ID=GL000246.1,length=38154,assembly=b37>
##contig=<ID=GL000249.1,length=38502,assembly=b37>
##contig=<ID=GL000196.1,length=38914,assembly=b37>
##contig=<ID=GL000248.1,length=39786,assembly=b37>
##contig=<ID=GL000244.1,length=39929,assembly=b37>
##contig=<ID=GL000238.1,length=39939,assembly=b37>
##contig=<ID=GL000202.1,length=40103,assembly=b37>
##contig=<ID=GL000234.1,length=40531,assembly=b37>
##contig=<ID=GL000232.1,length=40652,assembly=b37>
##contig=<ID=GL000206.1,length=41001,assembly=b37>
##contig=<ID=GL000240.1,length=41933,assembly=b37>
##contig=<ID=GL000236.1,length=41934,assembly=b37>
##contig=<ID=GL000241.1,length=42152,assembly=b37>
##contig=<ID=GL000243.1,length=43341,assembly=b37>
##contig=<ID=GL000242.1,length=43523,assembly=b37>
##contig=<ID=GL000230.1,length=43691,assembly=b37>
##contig=<ID=GL000237.1,length=45867,assembly=b37>
##contig=<ID=GL000233.1,length=45941,assembly=b37>
##contig=<ID=GL000204.1,length=81310,assembly=b37>
##contig=<ID=GL000198.1,length=90085,assembly=b37>
##contig=<ID=GL000208.1,length=92689,assembly=b37>
##contig=<ID=GL000191.1,length=106433,assembly=b37>
##contig=<ID=GL000227.1,length=128374,assembly=b37>
##contig=<ID=GL000228.1,length=129120,assembly=b37>
##contig=<ID=GL000214.1,length=137718,assembly=b37>
##contig=<ID=GL000221.1,length=155397,assembly=b37>
##contig=<ID=GL000209.1,length=159169,assembly=b37>
##contig=<ID=GL000218.1,length=161147,assembly=b37>
##contig=<ID=GL000220.1,length=161802,assembly=b37>
##contig=<ID=GL000213.1,length=164239,assembly=b37>
##contig=<ID=GL000211.1,length=166566,assembly=b37>
##contig=<ID=GL000199.1,length=169874,assembly=b37>
##contig=<ID=GL000217.1,length=172149,assembly=b37>
##contig=<ID=GL000216.1,length=172294,assembly=b37>
##contig=<ID=GL000215.1,length=172545,assembly=b37>
##contig=<ID=GL000205.1,length=174588,assembly=b37>
##contig=<ID=GL000219.1,length=179198,assembly=b37>
##contig=<ID=GL000224.1,length=179693,assembly=b37>
##contig=<ID=GL000223.1,length=180455,assembly=b37>
##contig=<ID=GL000195.1,length=182896,assembly=b37>
##contig=<ID=GL000212.1,length=186858,assembly=b37>
##contig=<ID=GL000222.1,length=186861,assembly=b37>
##contig=<ID=GL000200.1,length=187035,assembly=b37>
##contig=<ID=GL000193.1,length=189789,assembly=b37>
##contig=<ID=GL000194.1,length=191469,assembly=b37>
##contig=<ID=GL000225.1,length=211173,assembly=b37>
##contig=<ID=GL000192.1,length=547496,assembly=b37>
##contig=<ID=NC_007605,length=171823,assembly=b37>
##contig=<ID=hs37d5,length=35477943,assembly=b37>
##reference=/xx/hs37d5.fa
##tumor_sample=Tumor-666
##normal_sample=Normal-666
##bcftools_filterVersion=1.11+htslib-1.11
#VEP="v103" time="2024-02-01 13:52:04" cache=/xx/homo_sapiens_refseq/103_GRCh37" ensembl-variation=103.06320c4 ensembl=103.4c8d44a ensembl-io=103.353f93a ensembl-funcgen=103.b53bef4 1000genomes="phase3" COSMIC="90" ClinVar="201912" ESP="20141103" HGMD-PUBLIC="20194" assembly="GRCh37.p13" dbSNP="153" gencode="GENCODE 19" genebuild="2011-04" gnomAD="r2.1" polyphen="2.2.2" refseq="2019-10-24 23:10:14 - GCF_000001405.25_GRCh37.p13_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|TSL|REFSEQ_MATCH|REFSEQ_OFFSET|GIVEN_REF|USED_REF|BAM_EDIT|HGVS_OFFSET|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|FrameshiftSequence|WildtypeProtein">
##FrameshiftSequence=Predicted sequence for frameshift mutations
##WildtypeProtein=The normal, non-mutated protein sequence
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Normal-666 Tumor-666
chr12 48238361 . G GCCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC . clustered_events AS_FilterStatus=SITE;AS_SB_TABLE=101,4|1,6;DP=118;ECNT=5;GERMQ=93;MBQ=37,34;MFRL=52,194;MMQ=60,60;MPOS=49;NALOD=1.48;NLOD=8.75;POPAF=6;TLOD=19.3;CSQ=CCTCAATGAGGAGCACTCCAAGCAGTACCGCTGCCTCTCCTTCCAGCC|stop_gained&protein_altering_variant|HIGH|VDR|7421|Transcript|NM_001364085.1|protein_coding|10/10||NM_001364085.1:c.1451_1452insGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGG|NP_001351014.1:p.Asn484delinsLysAlaGlyArgArgGlySerGlyThrAlaTrpSerAlaProHisTer|1611-1612|1451-1452|484|N/KAGRRGSGTAWSAPH*G|aac/aaGGCTGGAAGGAGAGGCAGCGGTACTGCTTGGAGTGCTCCTCATTGAGGc|||-1||EntrezGene|||rseq_mrna_nonmatch&rseq_5p_mismatch||||OK|||||||||||||||MEAMAASTSLPDPGDFDRNVPRICGVCGDRATGFHFNAMTCEGCKGFFRRSMKRKALFTCPFNGDCRITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMILKRKEEEALKDSLRPKLSEEQQRIIAILLDAHHKTYDPTYSDFCQFRPPVRVNDGGGSHPSRPNSRHTPSFSGDSSSSCSDHCITSSDMMDSSSFSNLDLSEEDSDDPSVTLELSQLSMLPHLADLVSYSIQKVIGFAKMIPGFRDLTSEDQIVLLKSSAIEVIMLRSNESFTMDDMSWTCGNQDYKYRVSDVTKAGHSLELIEPLIKFQVGLKKLNLHEEEHVLLMAICIVSPDRPGVQDAALIEAIQDRLSNTLQTYIRCRHPPPGSHLLYAKMIQKLADLRSLNEEHSKQYRCLSFQPECSMKLTPLVLEVFGNEISLGQPVAVPGWGCSSRATCQARGWRLLSSPPHPVWGSAPPLPPPLSTQPILSPVQPNPFPAGFSPVP GT:AD:AF:DP:F1R2:F2R1:SB 0/0:29,0:0.0318:29:17,0:12,0:29,0,0,0 0/1:76,7:0.0936:83:57,1:19,0:72,4,1,6
hello,@https://github.com/susannasiebert
Is there any progress on the issue?
Thanks
My apologies, I only replied to your issue in the vatools repository. This issue should be fixed in VAtools 5.1.1. Using that version this variant, and others like it, should now be filtered out.
Thanks for the quick response.
I filtered it with griffithlab/VAtools#74 (comment)
Thanks!