not able to build database, taxonomy does not exist warnings, latest version by git
koppk opened this issue · 1 comments
Hi,
I saw other people also run into this problem. I cannot build a database and the problem seems to be the domain bacteria.
Thousands of warnings like that
Warning: taxonomy id doesn't exists for NZ_OY986432.1!
Warning: taxonomy id doesn't exists for NZ_OY986431.1!
Warning: taxonomy id doesn't exists for NZ_OY986433.1!
Warning: Taxonomy ID 1516075 is not in the provided taxonomy tree (taxonomy/nodes.dmp)!
Warning: Taxonomy ID 3111325 is not in the provided taxonomy tree (taxonomy/nodes.dmp)!
Warning: Taxonomy ID 3111776 is not in the provided taxonomy tree (taxonomy/nodes.dmp)!
While latest version from source:
git clone https://github.com/DaehwanKimLab/centrifuge
cd centrifuge
make
sudo make install prefix=/usr/local
Also tried not only
centrifuge-download -o taxonomy taxonomy
But also
downloading and extracting the latest (Jan 2024) taxdump.tar.gz directly from NCBI
No difference. File 2cf always empty, size 0 as others have observed before!
What I run on a 32 cores, 256 G RAM scaleway instance:
nohup centrifuge-build -p16--conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp input-sequences.fna abfpv &
Before I had built and concatenated the archea,bacteria, viral, fungi, protozoa seqid2taxid.map files and concatenated them into one.
This is very frustrating! I had used centrifuge successfully before but wanted to apply the latest version for some urgent research ...
The warnings should be fine. I think the issue is still the memory. The current bacteria database is quite large, so the 256G memory is likely not enough (but fairly close). You can try option "-a" with smaller values for "--bmax" and a larger value for "--dcv".