Incorrect classification duplication variant as intron coding?
Opened this issue · 1 comments
Hi all,
Thank you for creating this amazing tool.
I am analyzing variants from an old database and remapping them to hg38. I have what seems to be the same variant in two people annotated in two different ways - as an insertion and duplication. Here is the VCF input:
X 41473864 . G <DUP> . . PtID=XXXXXX;SVTYPE=DUP;SVLEN=9;END=41473872 GT:DP 1:150
X 41473872 . T TGCGCCGCCT . . PtID=YYYYYY;SVTYPE=INS;END=41473873 GT:DP 1:150
Ensembl's VEP correctly classifies the variant as protein coding
SnpEff incorrectly classifies it an intron variant
I am using GRCh38.p14 build
SnpEff version 5.2a
SnpEff command is very standard
java -jar snpEff.jar -d GRCh38.p14 vcf_output_nyx2_sorted.vcf > vcf_output_nyx2_sorted_ann.vcf
For now, I am just converting the short entries into INS entries as a workaround, but I am wondering what is causing this issue and how it can be fixed
Thank you for your help
More detail:
Parsed output from SnpEff:
CHROM POS ID REF ALT QUAL FILTER FORMAT NA0001 INFO_PtID INFO_SVTYPE INFO_SVLEN INFO_END Allele Annotation Annotation_Impact Gene_Name Gene_ID Feature_Type Feature_ID Transcript_BioType Rank HGVS.c HGVS.p cDNA.pos / cDNA.length CDS.pos / CDS.length AA.pos / AA.length Distance ERRORS / WARNINGS / INFO "> INFO_LOF INFO_NMD
X 41473864 . G <DUP> . . GT:DP 1:150 XXXXXX DUP 9 41473872 <DUP> intron_variant MODIFIER NYX NYX transcript NM_022567.3 protein_coding 1/1 c. INFO_REALIGN_3_PRIME NA NA
X 41473864 . G <DUP> . . GT:DP 1:150 XXXXXX DUP 9 41473872 <DUP> intron_variant MODIFIER NYX NYX transcript NM_001378477.3 protein_coding 2/2 c. INFO_REALIGN_3_PRIME NA NA
X 41473872 . T TGCGCCGCCT . . GT:DP 1:150 YYYYYY INS NA 41473873 TGCGCCGCCT disruptive_inframe_insertion MODERATE NYX NYX transcript NM_001378477.3 protein_coding 3/3 c.396_404dupGCGCCGCCT p.Leu135_Asp136insArgArgLeu 635/2414 405/1431 135/476 NA NA
X 41473872 . T TGCGCCGCCT . . GT:DP 1:150 YYYYYY INS NA 41473873 TGCGCCGCCT disruptive_inframe_insertion MODERATE NYX NYX transcript NM_022567.3 protein_coding 2/2 c.396_404dupGCGCCGCCT p.Leu135_Asp136insArgArgLeu 967/2746 405/1431 135/476 NA NA
It tries to do some weird realignment. From snpEff log in bash:
Variant (original) : chrX:41473864-41473871[DUP]
Variant (realinged) : chrX:41472836-41472836[INTERVAL]
Unsure why it's doing this... this should actually be the exact same variant