kraken2-build error when creating sequence ID to taxonomy ID map
Opened this issue · 9 comments
Hi,
I'm near the end of the Struo2 pipeline trying to create a custom kraken2 database using gtdb r207.
I've hit a wall though at the kraken2-build command, specifically one spot within the build_kraken2_db.sh script that the command calls. It seems that this section:
echo "Creating sequence ID to taxonomy ID map (step 1)..."
if [ -d "library/added" ]; then
find library/added/ -name 'prelim_map_*.txt' | xargs cat > library/added/prelim_map.txt
fi
seqid2taxid_map_file=seqid2taxid.map
if [ -e "$seqid2taxid_map_file" ]; then
echo "Sequence ID to taxonomy ID map already present, skipping map creation."
else
step_time=$(get_current_time)
find library/ -maxdepth 2 -name prelim_map.txt | xargs cat > taxonomy/prelim_map.txt
if [ ! -s "taxonomy/prelim_map.txt" ]; then
echo "No preliminary seqid/taxid mapping files found, aborting."
exit 1
fi
grep "^TAXID" taxonomy/prelim_map.txt | cut -f 2- > $seqid2taxid_map_file.tmp || true
if grep "^ACCNUM" taxonomy/prelim_map.txt | cut -f 2- > accmap_file.tmp; then
if compgen -G "taxonomy/*.accession2taxid" > /dev/null; then
lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp
cat seqid2taxid_acc.tmp >> $seqid2taxid_map_file.tmp
rm seqid2taxid_acc.tmp
else
echo "Accession to taxid map files are required to build this DB."
echo "Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy' again?"
exit 1
fi
fi
rm -f accmap_file.tmp
finalize_file $seqid2taxid_map_file
echo "Sequence ID to taxonomy ID map complete. [$(report_time_elapsed $step_time)]"
fi
Produces the error messages:
Accession to taxid map files are required to build this DB.
Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy again?
When I try to run through this line by line myself everything is fine until lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp
at which point I get the error Found 0/1363031 targets...lookup_accession_numbers: unable to open taxonomy/*.accession2taxid: No such file or directory
my ./taxonomy/ directory only contains the following:
-rw-r--r--+ 1 names.dmp
-rw-r--r--+ 1 nodes.dmp
drwxr-sr-x+ 2 .
-rw-r--r--+ 1 prelim_map.txt
drwxr-sr-x+ 5 ..
Should there be accession2taxid files in here? If so, when should they have been generated?
Happy to post on the kraken2 github if this is more appropriate but figured this maybe something that should have been generated elsewhere in the Struo2 pipeline.
Any help much appreciated, thanks!
hmm... an accession2taxid
file shouldn't be needed, unless that recently changed. Can you please try just creating an empty accession2taxid
file in the appropriate directory?
Thanks for the quick suggestion, no luck unfortunately.
Creating a blank file .accession2taxid
or accession2taxid
gives the same error and giving the file a filename like 1.accession2taxid
, test.accession2taxid
, blank.accession2taxid
, etc just produces lookup_accession_numbers: unable to mmap taxonomy/1.accession2taxid: Invalid argument
I’m on vacation this week, but I’ll have a look at the problem ASAP
No worries, thanks! Enjoy your vacation.
@joshsimcock I haven't been able to reproduce this issue. Can you provide more info, such as:
- the version of snakemake that you are using
- the versions of kraken2 & bracken in the conda env that is used by snakemake (in the
.snakemake/conda/
directory)
FYI: I'm working on creating Kraken2 & Bracken databases for Release 207 (followed later by the humann3 databases). They should be complete by the end of the week.
@nick-youngblut sorry for the long delay in replying.
snakemake = 7.6.2
kraken2 = 2.1.2
bracken = 2.5
Thanks for uploading the 207 release! Saves me a lot of time. If you can figure out what happened here great, but there is no rush as I can use your r207 builds for now thanks!
I am encountering the same problem, and I have resolved it by "chmod +x n*.dmp" after much time and effort. I am afraid that the problem is that the names.dmp and nodes.dmp are not able to be read. be read, as files in your ./taxonomy/ directory also were "-rw-r--r--".