leylabmpi/Struo2

kraken2 database update function

Closed this issue · 2 comments

Hi there! I'm attempting to update the existing custom GTDB kraken2 database downloaded as directed in the ReadMe file, but I'm running into an odd issue. The pipeline runs "successfully," but, rather than updating the downloaded database, struo2 generates a fresh database containing the genomes that I'm trying to add to the existing database. I'm using a config file that I've adapted from the original config-update.yaml file (minimal changes, just to choose the proper samples file, output directory, database files, etc.) I'll add that it's possible that the cause is related to missing download files. I was running into some storage issues last week, and I deleted some large files. I didn't think at the time that I had deleted anything important, but I could have been mistaken. The database update does still work properly for the toy dataset, and I'm able to update the toy database with my actual, larger samples data, so I think it's an issue with the downloaded custom GTDB database. Any help would be appreciated! I'll attach the config file I'm using here in .txt format.
config_test.txt

To be clear, the new new database produced by the Struo2 db-update pipeline only contains the new genomes, and not also the original genomes?

If I had to guess, the original genomes are no longer in your custom_dbs/GTDB_release202/library.

Another possibility is that Struo2 ran the db-create pipeline instead of the db-update pipeline. What was stated when you started the snakemake job? There should be a message stating whether Struo2 was going to runt the db-create or db-update pipeline.

That's correct, only the new genomes. I guess I must have deleted custom_dbs/GTDB_release202/library by accident. It was definitely the db-update pipeline; that much I know. I'll see if I can recover the library directory, but, if not, I can always delete the pre-made database and re-download it

EDIT_1: I'm trying to access http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/library, but I think the webpage might be down. I do think I should probably be able to get things working once it's back up, though

EDIT_2: The site is now up, and I do think I've got everything from the GTDB_release202 directory downloaded. Is the library directory under a different name? Thanks in advance!