DerrickWood/kraken

build_db: error opening taxonomy//nodes.dmp: No such file or directory 2020

Closed this issue · 2 comments

Hi all, I know this issue was previously posted and closed, however that solution already exists in my code so I believe it is a different issue.

$kraken2-build --build --threads 24 --db bacteria_db

Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 535611976 bytes
Capacity estimation complete. [10.243s]
Building database files (step 3)...
build_db: error opening taxonomy//nodes.dmp: No such file or directory
xargs: cat: terminated by signal 13

However, when I enter the taxonomy folder I find the files that are required,

$cd bacteria_db
$ls -l
rwxr-xr-x 3 root root 4096 Feb 8 13:01 library
-rw-r--r-- 1 root root 7701 Feb 8 13:07 seqid2taxid.map
drwxr-xr-x 2 root root 4096 Feb 8 14:06 taxonomy

$cd taxonomy
$ls -l
total 3710044
-rw-r--r-- 1 root root 0 Feb 8 14:03 accmap.dlflag
-rw-r--r-- 1 root root 11500420 Feb 8 12:50 nucl_gb.accession2taxid.gz
-rw-r--r-- 1 root root 3733127973 Feb 8 12:59 nucl_wgs.accession2taxid.gz
-rw-r--r-- 1 root root 8883 Feb 8 13:07 prelim_map.txt
-rw-r--r-- 1 root root 0 Feb 8 12:59 taxdump.dlflag
-rw-r--r-- 1 root root 54435754 Feb 8 12:59 taxdump.tar.gz

Is it something to do with there being 2 taxadump files? According to this[https://github.com//issues/55] but the solution suggested by Jen Lu is to

"wget ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz
wget ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_wgs.accession2taxid.gz
touch accmap.dlflag

and put all of those files into a taxonomy/ folder."

However as you can see, I have those files in the correct location. Any help would be greatly appreciated.

In case anyone comes across this and needs help, the problem here is that nucl_gb.accession2taxid.gz & nucl_wgs.accession2taxid.gz did not expand properly. The /taxonomy folder should look like this

-rw-r--r-- 1 root root 0 Feb 8 14:59 accmap.dlflag
-rw-r--r-- 1 9019 583 18744141 Feb 8 14:28 citations.dmp
-rw-r--r-- 1 9019 583 4184929 Feb 8 14:26 delnodes.dmp
-rw-r--r-- 1 9019 583 452 Feb 8 14:20 division.dmp
-rw-r--r-- 1 9019 583 16444 Feb 8 14:28 gc.prt
-rw-r--r-- 1 9019 583 4921 Feb 8 14:20 gencode.dmp
-rw-r--r-- 1 9019 583 1137217 Feb 8 14:26 merged.dmp
-rw-r--r-- 1 9019 583 204014115 Feb 8 14:28 names.dmp
-rw-r--r-- 1 9019 583 158489092 Feb 8 14:27 nodes.dmp
-rw-r--r-- 1 root root 10328418345 Feb 8 14:53 nucl_gb.accession2taxid
-rw-r--r-- 1 root root 23100102487 Feb 8 14:59 nucl_wgs.accession2taxid
-rw-rw---- 1 4544 583 2666 Sep 11 2019 readme.txt
-rw-r--r-- 1 root root 0 Feb 8 14:59 taxdump.dlflag
-rw-r--r-- 1 root root 54437067 Feb 8 14:59 taxdump.tar.gz
-rw-r--r-- 1 root root 0 Feb 8 15:03 taxdump.untarflag

My solution, was to delete everything and start again. However, this time when I did
$kraken2-build --download-taxonomy --db $DBNAME the outupt didn't have a unzip error that I previously just glanced over.

Hello,
I am trying to create a plant data base, however, when I am using obiconvert, I have this error:

obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/*.dat
[Errno 2] No such file or directory: './TAXO/nodes.dmp'
Exception ignored in: <function EcoPCRDBSequenceWriter.del at 0x7f2e5b5b4700>
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/obitools/ecopcr/sequence.py", line 180, in del
self.close()
File "/usr/lib/python3/dist-packages/obitools/ecopcr/sequence.py", line 149, in close
self._file.seek(0,0)
AttributeError: 'EcoPCRDBSequenceWriter' object has no attribute '_file'

I checked, and nodes.dmp is present in the TAXO file:

drwxr-xr-x 1 root root 4096 Mar 17 10:23 ./
drwxr-xr-x 1 501 staff 4096 Mar 17 09:45 ../
-rw-r--r-- 1 9019 583 19071824 Dec 16 14:28 citations.dmp
-rw-r--r-- 1 9019 583 4289335 Dec 16 14:25 delnodes.dmp
-rw-r--r-- 1 9019 583 452 Dec 16 14:20 division.dmp
-rw-r--r-- 1 9019 583 16444 Dec 16 14:28 gc.prt
-rw-r--r-- 1 9019 583 4921 Dec 16 14:20 gencode.dmp
-rw-r--r-- 1 9019 583 1217034 Dec 16 14:25 merged.dmp
-rw-r--r-- 1 9019 583 213788465 Dec 16 14:28 names.dmp
-rw-r--r-- 1 9019 583 164065373 Dec 16 14:27 nodes.dmp
-rw-rw---- 1 4544 583 2666 Sep 11 2019 readme.txt
-rw-r--r-- 1 root root 56908554 Dec 16 15:29 taxdump.tar.gz
-rw-r--r-- 1 root root 88 Dec 16 15:29 taxdump.tar.gz:Zone.Identifier

I would greatly appreciate your help,

Regards,

Andrea