dbNSFP plugin only producing empty columns
Closed this issue · 4 comments
Hello,
I'm trying to annotate my data with the dbNSFP plugin, and while it does add whichever columns are requested, they only ever contain missing values (ie a "-"). No errors or warnings are thrown.
The reason I think this is a problem is that I'm annotating a few thousand exome or exome-proximal snps, a significant number of which are bound to deserve an annotation. Matter of fact, many variants are annotated as having a Consequence of "missense_variant", and do get an annotation from the --sift & --polyphen flags, as well as from the AlphaMissense and CADD plugins.
The command I'm using :
/apps/ensembl-vep/vep --cache \
--offline \
--fasta /apps/ensembl-vep/databases/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
-i ~/scratch/vep_in.reformatted.tsv \
-o ~/scratch/vep_raw_output.tsv \
--force_overwrite \
--dir /apps/ensembl-vep/cache/ \
--dir_plugins /apps/ensembl-vep/cache/Plugins/ \
--fork 8\
--synonyms /home/emac/general_operations/annotate_vep/chr_synonyms.tsv \
--merged \
--assembly GRCh38 \
--no_stats \
--tab \
--symbol \
--pick \
--nearest symbol\
--polyphen b --humdiv \
--sift b \
--check_existing \
--protein \
--distance 0\
--plugin dbNSFP,${db}/dbNSFP4.6a/dbNSFP4.6a_grch38.gz,pep_match=0,consequence=ALL,genename,SIFT4G_converted_rankscore,GERP++_RS \
--plugin AlphaMissense,file=${db}/AlphaMissense_isoforms_hg38.tsv.gz \
--plugin CADD,file=${db}/cadd1.6_whole_genome_SNVs_inclAnno_hg38.tsv.gz
A few things I saw suggested in other issues and which I've tried without success :
- Running dbNSFP plugin on its own, without AlphaMissense or CADD at the same time, to avoid any potential conflicts
- Running with and without the --offline and --dir_plugins flags
- Running with and without the "pep_match=0" and "consequence=ALL" options in the --plugin dbNFSP line
- Running dbNSFP with the "ALL" option instead of specific cols
I'm working with VEP version 110 (and corresponding cache). I've tried dbNSFP versions 4.4 and 4.6, both produce the same empty columns. I installed both using the instructions provided on the VEP website.
A couple of example output lines, both for missense variants.
Would you happen to have any idea what might be the problem?
Thanks !
Elby
Hello @Hepit,
Thanks for your query!
I tried with dbNSFP v4.4a and could not reproduce the issue for the variant example you have provided. One thing to make sure is that you have correctly processed the dbNSFP file. To see, how to process the file you can check the DESCRIPTION section of the plugin.
Once that done, can you make sure you can correctly do tabix on the dbNSFP file. For example, the following query should work -
$ tabix ${db}/dbNSFP4.4a/dbNSFP4.4a_grch38.gz 10:180064-180064 | awk '$4 == "G"'
10 180064 C G L V . 10 226004 10 216004 18;18;18;18;18;18;18;18;18;18;18;18 ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11;ZMYND11 ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171;ENSG00000015171 ENST00000439456;ENST00000397962;ENST00000397959;ENST00000309776;ENST00000509513;ENST00000381591;ENST00000403354;ENST00000381607;ENST00000402736;ENST00000602682;ENST00000397955;ENST00000558098 ENSP00000397072;ENSP00000381053;ENSP00000381050;ENSP00000309992;ENSP00000424205;ENSP00000371003;ENSP00000385484;ENSP00000371020;ENSP00000386010;ENSP00000473321;ENSP00000381046;ENSP00000452959 E9PE09;Q15326;Q15326-5;B7Z2J6;Q15326-6;Q15326;B0QZE2;Q15326-2;E7ENI9;Q15326-5;E7EV75;Q15326-3 E9PE09_HUMAN;ZMY11_HUMAN;ZMY11_HUMAN;B7Z2J6_HUMAN;ZMY11_HUMAN;ZMY11_HUMAN;B0QZE2_HUMAN;ZMY11_HUMAN;E7ENI9_HUMAN;ZMY11_HUMAN;E7EV75_HUMAN;ZMY11_HUMAN .;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;.;c.52C>G .;p.L18V;p.L18V;p.L18V;p.L18V;p.L18V;p.L18V;p.L18V;p.L18V;p.L18V;.;p.L18V c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G;c.52C>G p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val;p.Leu18Val .....
Best regards,
Nakib
@nakib103 thanks for your quick response !
I've processed it as was indicated, with the exception that no file was found to download at ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.4a.zip, so I got it from Amazon instead. The exact code I ran is :
version=4.4a
wget https://dbnsfp.s3.amazonaws.com/dbNSFP4.4a.zip
unzip dbNSFP${version}.zip
zcat dbNSFP${version}_variant.chr1.gz | head -n1 > h
zgrep -h -v ^#chr dbNSFP${version}_variant.chr* | sort -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP${version}_grch38.gz
tabix -s 1 -b 2 -e 2 dbNSFP${version}_grch38.gz
When running your query, no output is produced. It also looks like "dbNSFP4.4a_grch38.gz" only contains a header - in fact, when unzipped it is identical to file "h". Is it possible there is a typo in the zgrep command?
Thanks !
Elby
Hi @Hepit,
The commands does not seems to have any problem. Can you make sure you have downloaded the dbNSFP files properly and they are not empty themselves. I would recommend downloading and processing the dbNSFP file from fresh and then try again.
Best regards,
Nakib
I ended up running the various commands separately instead of with pipes, and that seems to have done the trick. Oddly enough, when running the series of pipes together (zgrep -h -v ^#chr dbNSFP${version}_variant.chr* | sort -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP${version}_grch38.gz
), the end file only contains a header. Not sure which one of them fails.
Thanks a lot for your help with this !