pcingola/SnpEff

mm39 database download issue

Opened this issue · 6 comments

Describe the bug
Unable to download mm39

To Reproduce

  1. SnpEff version: SnpEff version SnpEff 5.2c (build 2024-04-09 12:24)
  2. Genome version: mm39
  3. SnpEff full command line: java -Xmx60g -jar ~/software/snpEff/snpEff.jar mm39 data/vcf/$ID.vcf.gz
  4. Output / Error message:
    FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_mm39.zip]

Expected behavior
SnpEff should download the database.

Data
N/A (reproducible with java -Xmx60g -jar ~/software/snpEff/snpEff.jar download mm39:

FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_mm39.zip]

Additional context
Issue is potentially related to #374, but not sure there is an applicable solution.

I had the same issue. I ended up having to redo part of my pipeline and realign to mm10, which I was able to download with snpEff. It looks like the latest sourceforge availability for databases is from 2018, before mm39 was released: https://sourceforge.net/projects/snpeff/files/databases/. Doesn't look like you can manually get mm39 from here and install it yourself.

Thanks for the comment. It would be great to have the database for mm39 given how much newer the genome is. Hopefully this is just a matter of reuploading existing files.

Hi, i still get the same error today.
Did anyone find a good solution/workaround to this or was able to build the db manually?

I had the same issue with mm39, but was eventually able to build the db manually. I tried a few of the options in the snpEff documentation, but the one that ultimately worked was the .gtf approach.

Most of what you need to know is in the documentation, but not always super clear. Here were the key points for me:

  1. Make sure you retrieve all of the required FASTA and GTF (and/or GFF) files from the same genome build, e.g. from UCSC (https://useast.ensembl.org/Mus_musculus/Info/Index). Unzip them into a specific directory (snpEff/data/mm39/) so the script can find them, and rename the files:

Mus_musculus.GRCm39.dna.primary_assembly.fa.gz: sequences.fa
Mus_musculus.GRCm39.cds.all.fa.gz: cds.fa
Mus_musculus.GRCm39.pep.all.fa.gz: protein.fa
Mus_musculus.GRCm39.112.gtf.gz: genes.gtf

  1. Modify the snpEff.config file to include the line:
    mm39.genome : Mouse

  2. Build the database:
    java -Xmx4g -jar snpEff.jar build -gtf22 -v mm39

This should create and save the .bin files required for annotation into /snpEff/data/mm39/
Should be good to go!

Hi,

This worked for me:

mkdir tmp; snpEff -Djava.io.tmpdir=tmp download -v EquCab3.0.105

Originally I had the following issue
while connecting to https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_EquCab3.0.105.zip FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_EquCab3.0.105.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_EquCab3.0.105.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_EquCab3.0.105.zip]