support for additional VEP terms
jxchong opened this issue · 6 comments
Based on the findings of the DDD paper, we would like to be able to filter for the following variant annotations created by the VEP SpliceRegion plugin
splice_donor_5th_base_variant
splice_donor_region_variant
splice_polypyrimidine_tract_variant
extended_intronic_splice_region_variant_5prime
extended_intronic_splice_region_variant_3prime
Info here: http://www.ensembl.info/2018/10/26/cool-stuff-the-vep-can-do-splice-site-variant-annotation/
Plugin here: https://github.com/Ensembl/VEP_plugins/blob/release/94/SpliceRegion.pm
None of these annotations are currently listed in GEMINI's impacts column. How would we be able to access them when they don't have their own custom vep_xxx column (my understanding is that they are just provided by VEP as the annotation)? (right now we just do impact_severity<>'LOW' in GEMINI so I imagine we would have to do impact_severity<>'LOW' or xxxxx='yyy' or ...
)
I honestly think this is the realm of the new gemini workflow based upon vcfanno and vcf2db. Our goal is the switch over to this entirely this year.
Thanks Aaron. If we switch to vcfanno/vcf2b right now, would these be accessible to us in queries?
If they are in the VCF via vcfanno or VEP, they make it into the database. @brentp - can you corroborate?
I think these would be impacts in the CSQ string, right? e.g. instead of splice_variant
it would now be splice_donor_5th_base_variant
so we'd have to update the geneimpacts module.
An example VCF with a few variants would be helpful.
Ok, we finally got this working in VEP and these show up in the CSQ string, but not in the Consequence field. They are instead in the SpliceRegionOutput field.
Here's an example. More examples in the VCF available here:
https://www.dropbox.com/s/mg7u3nkxil7p4h5/spliceregionexamples.vcf.gz?dl=0
1 38272660 rs2291297 G A 42583.1 PASS AC=1;AF=0.224;AN=2;BaseQRankSum=-1.622;ClippingRankSum=0.271;DB;DP=3988;ExcessHet=0.4621;FS=0.528;InbreedingCoeff=0.1
309;MLEAC=43;MLEAF=0.224;MQ=9.49;MQ0=0;MQRankSum=0;QD=19.89;ReadPosRankSum=0.463;SOR=0.637;CSQ=A|downstream_gene_variant|MODIFIER|MTF1|ENSG00000188786|Transcript|ENST00000373036|protein_coding||||||||||rs2291297|2579|-1||HGNC|7428|YES|CCDS30676.1|1|C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000373042|protein_coding|||||||||
|rs2291297|1158|1||HGNC|24789|YES|CCDS427.2||C1orf122||||||||||||,A|5_prime_UTR_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000373043|protein_coding|1/2||ENST00000373043.1:c.
-1697G>A||10/2229|||||rs2291297||1||HGNC|24789||CCDS44112.1||C1orf122||||||||||||,A|intron_variant|MODIFIER|YRDC|ENSG00000196449|Transcript|ENST00000373044|protein_coding||2/4|ENST00000373044.2:c.505-12C>T|||||||rs2291297||-1||HGNC|28905|YES|CCDS30675.1||C1orf122||||||||||||splice_polypyrimidine_tract_variant,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcrip
t|ENST00000419397|processed_transcript||||||||||rs2291297|672|1||HGNC|24789||||C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000446260|prot
ein_coding||||||||||rs2291297|1422|1||HGNC|24789||||C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000468084|protein_coding||||||||||rs22912
97|759|1||HGNC|24789||CCDS44112.1||C1orf122||||||||||||,A|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000004891|promoter||||||||||rs2291297|||||||||C1orf122|||||||||||| GT:AD:DP:GQ:PL 0/1:37,27:.:99:771,0,945
Gotcha, looks like we would need to update the logic in geneimpacts and in vcf2db to support this.