Building custom database issue
jongin333 opened this issue · 5 comments
Hi,
I'd like to build a custom database for some of NCBI refseq sequences.
Before that, I attempted to build a very small database, but It was failed.
Here is the directory structure for building the database, the configure file and log file.
======= directory structure =======
./db
./db/clark
./db/clark/genomes.fna
./db/dudes
./db/dudes/genomes.fna
./db/kaiju
./db/kaiju/genome.gbff
./db/kraken
./db/kraken/genome.fna
======= config file =======
workdir: "/mss2/projects/META2/taxonomy_classification/metameta"
databases:
- custom_db
custom_db:
clark: "/mss2/projects/META2/taxonomy_classification/metameta/db/clark"
dudes: "/mss2/projects/META2/taxonomy_classification/metameta/db/dudes"
kaiju: "/mss2/projects/META2/taxonomy_classification/metameta/db/kaiju"
kraken: "/mss2/projects/META2/taxonomy_classification/metameta/db/kraken"
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"
samples:
"TEST":
fq1: "test1_1.fq.gz"
fq2: "test1_2.fq.gz"
gzipped: 1
threads: 50
======= Log file =======
Building DAG of jobs...
Provided cores: 5
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 clark_db_custom_1
1 clark_db_custom_2
1 clark_db_custom_3
1 clark_db_custom_4
1 clark_db_custom_check
1 clark_db_custom_profile
1 clark_rpt
1 clark_run_1
4 clean_files
1 clean_reads
4 database_profile
1 dudes_db_custom_1
1 dudes_db_custom_2
1 dudes_db_custom_3
1 dudes_db_custom_check
1 dudes_db_custom_profile
1 dudes_rpt
1 dudes_run_1
1 dudes_run_2
1 errorcorr_reads
1 get_accession2taxid
1 get_gi_taxid_nucl
1 get_taxdump
1 kaiju_db_custom_1
1 kaiju_db_custom_2
1 kaiju_db_custom_3
1 kaiju_db_custom_4
1 kaiju_db_custom_check
1 kaiju_db_custom_profile
1 kaiju_rpt
1 kaiju_run_1
1 kraken_db_custom_1
1 kraken_db_custom_2
1 kraken_db_custom_3
1 kraken_db_custom_check
1 kraken_db_custom_profile
1 kraken_rpt
1 kraken_run_1
1 krona
1 metametamerge
1 subsample_reads
1 trim_reads
49
MetaMeta Pipeline v1.2.0 by Vitor C. Piro (vitorpiro@gmail.com, http://github.com/pirovc)
Parameters:
- bins: 4
- custom_db: OrderedDict([('clark', 'db/clark'),
('dudes', 'db/dudes'),
('kaiju', 'db/kaiju'),
('kraken', 'db/kraken')]) - cutoff: 0.0001
- databases: ['custom_db']
- dbdir: 'db'
- desiredminlen: 70
- detailed: 0
- errorcorr: 0
- gzipped: 1
- keepfiles: 0
- mode: 'linear'
- ranks: 'species'
- replacement: 0
- samples: OrderedDict([('TEST',
OrderedDict([('fq1', 'test1_1.fq.gz'),
('fq2', 'test1_2.fq.gz')]))]) - samplesize: 1
- strictness: 0.8
- subsample: 0
- threads: 50
- tool_alt_path: {'bowtie2': '',
'clark': '',
'dudes': '',
'gottcha': '',
'kaiju': '',
'kraken': '',
'krona': '',
'metametamerge': '',
'motus': '',
'spades': '',
'trimmomatic': ''} - tools: {'clark': 'b',
'dudes': 'p',
'gottcha': 'p',
'kaiju': 'b',
'kraken': 'b',
'motus': 'p'} - trimming: 0
- verbose: 0
- workdir: '.'
rule kaiju_db_custom_1:
output: dbcustom_db/kaiju_db/kaiju_db.faa
log: dbcustom_db/log/kaiju_db_custom_1.log
jobid: 48
benchmark: dbcustom_db/log/kaiju_db_custom_1.time
wildcards: database=custom_db
rule get_gi_taxid_nucl:
output: dbtaxonomy/gi_taxid_nucl.dmp.gz
log: dbtaxonomy/log/get_gi_taxid_nucl.log
jobid: 44
benchmark: dbtaxonomy/log/get_gi_taxid_nucl.time
rule get_taxdump:
output: dbtaxonomy/taxdump.tar.gz, dbtaxonomy/names.dmp, dbtaxonomy/nodes.dmp, dbtaxonomy/merged.dmp
log: dbtaxonomy/log/get_taxdump.log
jobid: 4
benchmark: dbtaxonomy/log/get_taxdump.time
rule get_accession2taxid:
output: dbtaxonomy/nucl_gb.accession2taxid.gz, dbtaxonomy/nucl_wgs.accession2taxid.gz
log: dbtaxonomy/log/get_accession2taxid.log
jobid: 47
benchmark: dbtaxonomy/log/get_accession2taxid.time
Activating conda environment /mss2/projects/META2/taxonomy_classification/metameta/.snakemake/conda/0e3e8e78.
rule clark_db_custom_1:
output: dbcustom_db/clark_db/Custom/
log: dbcustom_db/log/clark_db_custom_1.log
jobid: 40
benchmark: dbcustom_db/log/clark_db_custom_1.time
wildcards: database=custom_db
Finished job 40.
1 of 49 steps (2%) done
rule kaiju_db_custom_profile:
output: dbcustom_db/kaiju.dbaccession.out
log: dbcustom_db/log/kaiju_db_custom_profile.log
jobid: 36
benchmark: dbcustom_db/log/kaiju_db_custom_profile.time
wildcards: database=custom_db
Finished job 36.
2 of 49 steps (4%) done
Finished job 48.
3 of 49 steps (6%) done
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Finished job 44.
4 of 49 steps (8%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
An error has occured.
Please check the main log file for more information:
/mss2/projects/META2/taxonomy_classification/metameta/metameta_2019-08-15_23-59-47.log
Detailed output and execution time for each rule can be found at:
/mss2/projects/META2/taxonomy_classification/metameta/db/log/
/mss2/projects/META2/taxonomy_classification/metameta/SAMPLE_NAME/log/
=======
How can I build a custom database?
What did I miss?
Thank you,
Jongin
Hi Jongin,
Apparently jobs 4 (get_taxdump) and 47 (get_accession2taxid) could not be finished. Both of them are just downloading data from NCBI servers. Do you have some internet restrictions? you can check what was the error for those rules in:
/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_taxdump.log
/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_accession2taxid.log
Best
Vitor
Hi Vitor,
I found the problem in the 'dbdir' path in the configure file.
So, I changed and reran (wipe all and reran in a newly made directory).
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"
->
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db/"
Then, the previous errors were passed, but the other problem occurred in dudes.
Here is the bottom of the log file.
Error in rule dudes_db_custom_profile:
jobid: 47
output: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
log: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log
RuleException:
CalledProcessError in line 56 of /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm:
Command ' set -euo pipefail; python3 -c 'import numpy as np; npzfile = np.load("/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes_db/dudes_db.npz"); print("\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out 2> /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log ' returned non-zero exit status 1.
File "/mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm", line 56, in __rule_dudes_db_custom_profile
File "/mss1/programs/europa/miniconda2/envs/metametaenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job dudes_db_custom_profile since they might be corrupted:
/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
Will exit after finishing currently running jobs.
Touching output file /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/clark_db/.taxondata.
Finished job 34.
13 of 49 steps (27%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
How can I fix it?
Thank you,
Jongin
Dear Jongin,
Can you please send me the contetnts of the file: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log
Vitor
Hi, Victor.
Unfortunately, the log file (dudes_db_custom_profile.log) is empty.
I attach the final log file and the list of log directory. I hope that those are helpful.
-rw-r--r-- 1 jongin bioinfo 0 Aug 17 11:05 clark_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo 114 Aug 17 11:05 clark_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo 101 Aug 17 12:26 clark_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo 120 Aug 17 12:26 clark_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 20 Aug 17 11:07 dudes_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo 117 Aug 17 11:07 dudes_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 1392 Aug 17 12:19 dudes_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo 134 Aug 17 12:19 dudes_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo 0 Aug 17 12:19 dudes_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo 0 Aug 17 11:05 kaiju_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo 118 Aug 17 11:05 kaiju_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo 476 Aug 17 12:19 kaiju_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo 117 Aug 17 12:19 kaiju_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 976 Aug 17 12:19 kaiju_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo 114 Aug 17 12:19 kaiju_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo 10 Aug 17 11:07 kaiju_db_custom_4.log
-rw-r--r-- 1 jongin bioinfo 116 Aug 17 11:07 kaiju_db_custom_4.time
-rw-r--r-- 1 jongin bioinfo 0 Aug 17 11:05 kaiju_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo 114 Aug 17 11:05 kaiju_db_custom_profile.time
-rw-r--r-- 1 jongin bioinfo 20 Aug 17 11:32 kraken_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo 126 Aug 17 11:32 kraken_db_custom_2.time
metameta_2019-08-17_12-26-40.log
Thank you,
Jongin
Hi Jongin,
Can you check what is the python version in your environment? I guess the error can be related to this: pirovc/dudes#2 if you have python > 3.5
In this case, you can try to change the following line in your metameta installation (the file should be at /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm
based on your log file
metameta/tools/dudes_db_custom.sm
Line 56 in 3a3c203
for this one:
shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}", allow_pickle=True); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""
but I guess the error will also occur when running dudes and you should change dudes code as well.
You can also try to install python 3.5 in the metametaenv and everything should be fine.
Cheers
Vitor