pirovc/metameta

Building custom database issue

jongin333 opened this issue · 5 comments

Hi,

I'd like to build a custom database for some of NCBI refseq sequences.
Before that, I attempted to build a very small database, but It was failed.

Here is the directory structure for building the database, the configure file and log file.

======= directory structure =======
./db
./db/clark
./db/clark/genomes.fna
./db/dudes
./db/dudes/genomes.fna
./db/kaiju
./db/kaiju/genome.gbff
./db/kraken
./db/kraken/genome.fna

======= config file =======
workdir: "/mss2/projects/META2/taxonomy_classification/metameta"

databases:

  • custom_db

custom_db:
clark: "/mss2/projects/META2/taxonomy_classification/metameta/db/clark"
dudes: "/mss2/projects/META2/taxonomy_classification/metameta/db/dudes"
kaiju: "/mss2/projects/META2/taxonomy_classification/metameta/db/kaiju"
kraken: "/mss2/projects/META2/taxonomy_classification/metameta/db/kraken"

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"

samples:
"TEST":
fq1: "test1_1.fq.gz"
fq2: "test1_2.fq.gz"

gzipped: 1
threads: 50

======= Log file =======
Building DAG of jobs...
Provided cores: 5
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 clark_db_custom_1
1 clark_db_custom_2
1 clark_db_custom_3
1 clark_db_custom_4
1 clark_db_custom_check
1 clark_db_custom_profile
1 clark_rpt
1 clark_run_1
4 clean_files
1 clean_reads
4 database_profile
1 dudes_db_custom_1
1 dudes_db_custom_2
1 dudes_db_custom_3
1 dudes_db_custom_check
1 dudes_db_custom_profile
1 dudes_rpt
1 dudes_run_1
1 dudes_run_2
1 errorcorr_reads
1 get_accession2taxid
1 get_gi_taxid_nucl
1 get_taxdump
1 kaiju_db_custom_1
1 kaiju_db_custom_2
1 kaiju_db_custom_3
1 kaiju_db_custom_4
1 kaiju_db_custom_check
1 kaiju_db_custom_profile
1 kaiju_rpt
1 kaiju_run_1
1 kraken_db_custom_1
1 kraken_db_custom_2
1 kraken_db_custom_3
1 kraken_db_custom_check
1 kraken_db_custom_profile
1 kraken_rpt
1 kraken_run_1
1 krona
1 metametamerge
1 subsample_reads
1 trim_reads
49


MetaMeta Pipeline v1.2.0 by Vitor C. Piro (vitorpiro@gmail.com, http://github.com/pirovc)

Parameters:

  • bins: 4
  • custom_db: OrderedDict([('clark', 'db/clark'),
    ('dudes', 'db/dudes'),
    ('kaiju', 'db/kaiju'),
    ('kraken', 'db/kraken')])
  • cutoff: 0.0001
  • databases: ['custom_db']
  • dbdir: 'db'
  • desiredminlen: 70
  • detailed: 0
  • errorcorr: 0
  • gzipped: 1
  • keepfiles: 0
  • mode: 'linear'
  • ranks: 'species'
  • replacement: 0
  • samples: OrderedDict([('TEST',
    OrderedDict([('fq1', 'test1_1.fq.gz'),
    ('fq2', 'test1_2.fq.gz')]))])
  • samplesize: 1
  • strictness: 0.8
  • subsample: 0
  • threads: 50
  • tool_alt_path: {'bowtie2': '',
    'clark': '',
    'dudes': '',
    'gottcha': '',
    'kaiju': '',
    'kraken': '',
    'krona': '',
    'metametamerge': '',
    'motus': '',
    'spades': '',
    'trimmomatic': ''}
  • tools: {'clark': 'b',
    'dudes': 'p',
    'gottcha': 'p',
    'kaiju': 'b',
    'kraken': 'b',
    'motus': 'p'}
  • trimming: 0
  • verbose: 0
  • workdir: '.'

rule kaiju_db_custom_1:
output: dbcustom_db/kaiju_db/kaiju_db.faa
log: dbcustom_db/log/kaiju_db_custom_1.log
jobid: 48
benchmark: dbcustom_db/log/kaiju_db_custom_1.time
wildcards: database=custom_db

rule get_gi_taxid_nucl:
output: dbtaxonomy/gi_taxid_nucl.dmp.gz
log: dbtaxonomy/log/get_gi_taxid_nucl.log
jobid: 44
benchmark: dbtaxonomy/log/get_gi_taxid_nucl.time

rule get_taxdump:
output: dbtaxonomy/taxdump.tar.gz, dbtaxonomy/names.dmp, dbtaxonomy/nodes.dmp, dbtaxonomy/merged.dmp
log: dbtaxonomy/log/get_taxdump.log
jobid: 4
benchmark: dbtaxonomy/log/get_taxdump.time

rule get_accession2taxid:
output: dbtaxonomy/nucl_gb.accession2taxid.gz, dbtaxonomy/nucl_wgs.accession2taxid.gz
log: dbtaxonomy/log/get_accession2taxid.log
jobid: 47
benchmark: dbtaxonomy/log/get_accession2taxid.time

Activating conda environment /mss2/projects/META2/taxonomy_classification/metameta/.snakemake/conda/0e3e8e78.

rule clark_db_custom_1:
output: dbcustom_db/clark_db/Custom/
log: dbcustom_db/log/clark_db_custom_1.log
jobid: 40
benchmark: dbcustom_db/log/clark_db_custom_1.time
wildcards: database=custom_db

Finished job 40.
1 of 49 steps (2%) done

rule kaiju_db_custom_profile:
output: dbcustom_db/kaiju.dbaccession.out
log: dbcustom_db/log/kaiju_db_custom_profile.log
jobid: 36
benchmark: dbcustom_db/log/kaiju_db_custom_profile.time
wildcards: database=custom_db

Finished job 36.
2 of 49 steps (4%) done
Finished job 48.
3 of 49 steps (6%) done
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Finished job 44.
4 of 49 steps (8%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message


An error has occured.
Please check the main log file for more information:
/mss2/projects/META2/taxonomy_classification/metameta/metameta_2019-08-15_23-59-47.log
Detailed output and execution time for each rule can be found at:
/mss2/projects/META2/taxonomy_classification/metameta/db/log/
/mss2/projects/META2/taxonomy_classification/metameta/SAMPLE_NAME/log/

=======

How can I build a custom database?
What did I miss?

Thank you,
Jongin

Hi Jongin,

Apparently jobs 4 (get_taxdump) and 47 (get_accession2taxid) could not be finished. Both of them are just downloading data from NCBI servers. Do you have some internet restrictions? you can check what was the error for those rules in:

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_taxdump.log

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_accession2taxid.log

Best
Vitor

Hi Vitor,

I found the problem in the 'dbdir' path in the configure file.
So, I changed and reran (wipe all and reran in a newly made directory).

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"
->
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db/"

Then, the previous errors were passed, but the other problem occurred in dudes.
Here is the bottom of the log file.

Error in rule dudes_db_custom_profile:
    jobid: 47
    output: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
    log: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

RuleException:
CalledProcessError in line 56 of /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm:
Command ' set -euo pipefail;  python3 -c 'import numpy as np; npzfile = np.load("/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes_db/dudes_db.npz"); print("\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out 2> /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log ' returned non-zero exit status 1.
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm", line 56, in __rule_dudes_db_custom_profile
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job dudes_db_custom_profile since they might be corrupted:
/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
Will exit after finishing currently running jobs.
Touching output file /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/clark_db/.taxondata.
Finished job 34.
13 of 49 steps (27%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

How can I fix it?

Thank you,
Jongin

Dear Jongin,

Can you please send me the contetnts of the file: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

Vitor

Hi, Victor.

Unfortunately, the log file (dudes_db_custom_profile.log) is empty.
I attach the final log file and the list of log directory. I hope that those are helpful.

-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 clark_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 clark_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  101 Aug 17 12:26 clark_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  120 Aug 17 12:26 clark_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:07 dudes_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 11:07 dudes_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 1392 Aug 17 12:19 dudes_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  134 Aug 17 12:19 dudes_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 12:19 dudes_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  118 Aug 17 11:05 kaiju_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  476 Aug 17 12:19 kaiju_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 12:19 kaiju_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo  976 Aug 17 12:19 kaiju_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 12:19 kaiju_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo   10 Aug 17 11:07 kaiju_db_custom_4.log
-rw-r--r-- 1 jongin bioinfo  116 Aug 17 11:07 kaiju_db_custom_4.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 kaiju_db_custom_profile.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:32 kraken_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  126 Aug 17 11:32 kraken_db_custom_2.time

metameta_2019-08-17_12-26-40.log

Thank you,
Jongin

Hi Jongin,

Can you check what is the python version in your environment? I guess the error can be related to this: pirovc/dudes#2 if you have python > 3.5

In this case, you can try to change the following line in your metameta installation (the file should be at /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm based on your log file

shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}"); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""

for this one:

shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}", allow_pickle=True); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""

but I guess the error will also occur when running dudes and you should change dudes code as well.

You can also try to install python 3.5 in the metametaenv and everything should be fine.

Cheers
Vitor