leylabmpi/Struo2

amino acid database for HUMAnN3 run

kk1120 opened this issue · 4 comments

Hi.
Thank you for developing this wonderful tool.

I have finished analyzing kraken/bracken using the pre-build database based on GTDB207 that you created and
I am now working on the HUMAnN3 analysis.

When I ran HUMAnN3, I used as the protein database for diamond the uniref90_201901.dmnd that I downloaded from your ftp server. But the latest version of the protein database in HUMAnN3 is not 201901 but 201901b.
(I then received an error saying that the latest version is 201901b, not 201901. This is not a problem as it can be addressed by renaming it...)

So, I would like to confirm this.
Am I correct in understanding that the uniref90_201901.dmnd for humann3 that can be downloaded from the ftp server also uses UniRef 90(version 201901)?
If I want to use uniref 90 (version 201901b), should I use the Struo2 pipeline to generate a new database?

(I do not intend to add my samples to an existing database, I just want to use the GTDB database.)

thanks!

When I ran HUMAnN3, I used as the protein database for diamond the uniref90_201901.dmnd that I downloaded from your ftp server. But the latest version of the protein database in HUMAnN3 is not 201901 but 201901b.

Yes, I used the 201901 database for creating the Struo2 custom databases

Am I correct in understanding that the uniref90_201901.dmnd for humann3 that can be downloaded from the ftp server also uses UniRef 90(version 201901)?

Yes, I'm using the UniRef version used for the HUMAnN3 201901 database.

If I want to use uniref 90 (version 201901b), should I use the Struo2 pipeline to generate a new database?

Yes, you would have to create a new set of custom databases with Struo2

Thank you for your quick response.

Is it possible to just generate the databases involved in the HUMAnN3 analysis to save time? In that case, in the config.yaml file, in the databases section, should I replace kraken2 and bracken with Skip and run the pipeline?

Also, what is required as input files for this?

Thanks for your kind understanding

Is it possible to just generate the databases involved in the HUMAnN3 analysis to save time?

You can skip the kraken2/bracken steps, but the humann3 part of the pipeline requires a lot of computation. Check out the Struo2 paper for an idea of how much is needed.

Thank you for your kind support. I'll try it.