Error downloading databases
Closed this issue · 5 comments
I followed the directions to generate databases here and ran into an error. After decompressing the database tar file and running the following command: snakemake --configfile snakemake/config/sample_config.yaml --snakefile snakemake/workflow/download_databases.smk --cores 8
I get the output below. My snakemake version is 5.26.1.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
1 cluster_uniprot
1 download_id_taxonomy_mapping
1 download_ncbi_taxonomy
1 download_uniprot_viruses
1 download_uniref50
1 extract_ncbi_taxonomy
1 line_sine_download
1 make_bac_databases
1 make_host_databases
1 mmseqs_uniprot_clusters
1 mmseqs_uniprot_taxdb
1 mmseqs_urv
1 mmseqs_urv_taxonomy
1 uniprot_to_ncbi_mapping
1 uniref_plus_viruses
16
[Thu Oct 8 18:24:54 2020]
rule download_uniref50:
output: databases/proteins/uniref50.fasta.gz
jobid: 16
[Thu Oct 8 18:24:54 2020]
rule download_id_taxonomy_mapping:
output: databases/taxonomy/idmapping.dat.gz
jobid: 9
[Thu Oct 8 18:24:54 2020]
rule download_ncbi_taxonomy:
output: databases/taxonomy/taxdump.tar.gz
jobid: 14
[Thu Oct 8 18:24:54 2020]
rule make_bac_databases:
input: databases/bac_giant_unique_species/bac_uniquespecies_giant.masked_Ns_removed.fasta
output: databases/bac_giant_unique_species/ref
jobid: 1
resources: time_min=240, mem_mb=100000, cpus=16
[Thu Oct 8 18:24:54 2020]
rule download_uniprot_viruses:
output: databases/proteins/uniprot_virus.faa
jobid: 4
[Thu Oct 8 18:24:54 2020]
rule make_host_databases:
input: databases/human_masked/human_virus_masked.fasta
output: databases/human_masked/ref
jobid: 2
resources: time_min=240, mem_mb=100000, cpus=16
[Thu Oct 8 18:24:54 2020]
rule line_sine_download:
output: databases/contaminants/line_sine.fasta
jobid: 3
[Thu Oct 8 18:24:54 2020]
[Thu Oct 8 18:24:54 2020]
Error in rule download_id_taxonomy_mapping:
Error in rule download_uniprot_viruses:
jobid: 9
jobid: 4
output: databases/taxonomy/idmapping.dat.gz
output: databases/proteins/uniprot_virus.faa
shell:
cd databases/taxonomy;
curl -LO "https://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
shell:
mkdir -p databases/proteins && curl -Lgo databases/proteins/uniprot_virus.faa "https://www.uniprot.org/uniprot/?query=taxonomy:%22Viruses%20[10239]%22&format=fasta&&sort=score&fil=reviewed:no"
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Thu Oct 8 18:24:55 2020]
Error in rule line_sine_download:
jobid: 3
output: databases/contaminants/line_sine.fasta
shell:
(curl -L http://sines.eimb.ru/banks/SINEs.bnk && curl -L http://sines.eimb.ru/banks/LINEs.bnk) | sed -e '/^>/ s/ /_/g' | seqtk rename > databases/contaminants/line_sine.fasta
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job line_sine_download since they might be corrupted:
databases/contaminants/line_sine.fasta
[Thu Oct 8 18:24:57 2020]
Finished job 14.
1 of 16 steps (6%) done
[Thu Oct 8 18:29:42 2020]
Finished job 2.
2 of 16 steps (12%) done
[Thu Oct 8 18:30:36 2020]
Finished job 1.
3 of 16 steps (19%) done
[Thu Oct 8 18:40:49 2020]
Finished job 16.
4 of 16 steps (25%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /mnt/pathogen1/stahan/hecatomb/.snakemake/log/2020-10-08T182453.577588.snakemake.log```
I think they are all the same issue, (perhaps curl
is missing?)
Can you take a look in /mnt/pathogen1/stahan/hecatomb/.snakemake/log/2020-10-08T182453.577588.snakemake.log
for line_sine_download
and see if it gives you more information about the error
Additionally, what does curl --version
return?
curl --version
returns curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets
I was able to fix the error in rule line_sine_download by installing seqtk to my environment.
Fixed the issue by installing curl, cd-hit, mmseqs and seqtk to my environment.
Reopening ... Rob to add those to conda environment.
mostly irrelevant with newest version.