leylabmpi/Struo2

link issues and md5 check sums fails

Closed this issue · 9 comments

The m5dsum checks should now work in both cases. Note: I changed database_kraken.md5 to database.kraken.md5

Thanks so much for updating and helping me troubleshoot. I can confirm the taxdump works and sums check passed. I deleted the kraken2 database files I had downloaded yesterday (cry) and starting over again, hopefully by tomorrow I can update if md5sum checks work there too (it did not when I simply replaced the md5 file just a few mins ago).

And what about the missing k2d.md5 file that is listed here (but not there): http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/k2d.md5

Edit: that link is referenced on this page under "Kraken2 database" https://github.com/leylabmpi/Struo2/wiki/GTDB-database-download-and-usage

I deleted the kraken2 database files I had downloaded yesterday (cry)

Why did you delete them? I just had to upload the md5 file.

And what about the missing k2d.md5 file that is listed here (but not there)

I'll try to clear up the docs

I deleted the kraken2 database files I had downloaded yesterday (cry)

Why did you delete them? I just had to upload the md5 file.

Like I said, I tried with the new md5 file and it did not work.

hmm.... that's odd. Maybe the database didn't download fully.
I just checked, and md5sum --check database.kraken.md5 worked fine locally. I can't run it on the ftp server to check directly.
Let me know if you still have a failed md5sum check. I might need to re-upload the database.kraken file.

I will update tomorrow for sure. Thanks again.

md5sum check worked now with the database.kraken newly downloaded yesterday, and the new md5 file downloaded this morning.

I deleted the kraken2 database files I had downloaded yesterday (cry)

Why did you delete them? I just had to upload the md5 file.

And what about the missing k2d.md5 file that is listed here (but not there)

I'll try to clear up the docs

Any chance to clarify the Kraken2 database download links and files? The instructions still list this set of links, but the k2d.md5 file has never existed:
wget --directory-prefix $DBDIR http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/hash.k2d
wget --directory-prefix $DBDIR http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/opts.k2d
wget --directory-prefix $DBDIR http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/taxo.k2d
wget --directory-prefix $DBDIR http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/k2d.md5 # MISSING
md5sum --check $DBDIR/k2d.md5

I'm also wondering, what is the difference between downloading those files listed above that are in the kraken2 directory on the ftp site (the pre-built kraken2 from the GTDB release202), and downloading genomes using the instructions provided on the main README page about downloading a custom db from the GTDB using the following:

Filtering GTDB metadata to certain genomes

./GTDB_metadata_filter.R -o gtdb-r95_bac-arc.tsv https://data.gtdb.ecogenomic.org/releases/release95/95.0/ar122_metadata_r95.tar.gz https://data.gtdb.ecogenomic.org/releases/release95/95.0/bac120_metadata_r95.tar.gz

Downloading all genomes (& creating tab-delim table of genome info)

./genome_download.R -o genomes -p 8 gtdb-r95_bac-arc.tsv > genomes.txt

Note: the output of ./genome_download.R can be directly used for running the Struo2 pipeline (see below)

Note: genome fasta files can be compressed (gzip or bzip2) or uncompress for input to Struo2

It seems to me that this would be the same result? Could this be clarified on the documentation, as in, is this just provided as an example for how to use the R scripts to filter and download genomes from GTDB? It's confusing since there is not much description for how to use the R filter (where are the key areas to modify or that indicate the filter criteria), and then no information on what exactly the example for downloading genomes from GTDB is actually downloading using these set of commands and scripts.
Thanks a lot.

md5sum check worked now with the database.kraken newly downloaded yesterday, and the new md5 file downloaded this morning.

That's great to hear!

wget --directory-prefix $DBDIR http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release202/kraken2/k2d.md5 # MISSING
md5sum --check $DBDIR/k2d.md5

It's now been added. Sorry about that.

I'm also wondering, what is the difference between downloading those files listed above that are in the kraken2 directory on the ftp site (the pre-built kraken2 from the GTDB release202), and downloading genomes using the instructions provided on the main README page about downloading a custom db from the GTDB using the following:

You only need to download genomes from the GTDB if you want to create your own custom database(s).