need gene_db example for update HUMAnN3 with new gene set input
Closed this issue · 3 comments
Hi, thanks for your work on Struo2!
I have a set of MAGs and I want to use Struo2 to make a new Kraken database with genome sequence, and update the current HUMAnN3 with gene set (which I get by running prodigal).
I had successfully gotten the new Kraken database (based on your great work!), while when I use the config-update.yaml as a template to run the HUMAnN3 update pipeline, I got the error X For user-provided gene sequences you must update the genes database!
which means I need update genes at the same time.
But updating the genes database needs:
genes_db:
genes:
mmseqs_db: tests/output_GTDBr95_n10/genes/genes_db.tar.gz
amino_acid: tests/output_GTDBr95_n10/genes/genome_reps_filtered.faa.gz
nucleotide: tests/output_GTDBr95_n10/genes/genome_reps_filtered.fna.gz
metadata: tests/output_GTDBr95_n10/genes/genome_reps_filtered.txt.gz
cluster:
mmseqs_db: tests/output_GTDBr95_n10/genes/cluster/clusters_db.tar.gz
I have the following questions hope you can help me:
- I use HUMAnN3 uniref90 database, is there any way to get the
amino_acid
,nucleotide
andmetadata
of this database? If not, could you provide examples of those data? - Like mmseqs takes a long time to re-index the database, I prefer to use diamond in HUMAnN3 updating, can I skip
mmseqs_db
in the gene updating step? - All in all, my goal is to update the HUMAnN3 database with personal data, I will glad if you had a recommended way or other advice to do it.
Regards,
Yiqi
I’m on vacation this week, but I’ll have a look at the problem ASAP
- You'd have to ask the HUMAnN3 developers
- Only mmseqs can iteratively update a gene cluster database, which is why it is used versus diamond
- See https://github.com/leylabmpi/Struo2/wiki/Database-updating-tutorial:-adding-genes
Thanks for your reply!
I think merging my gene set to your pre-processed GTDB database may be a better choice.
However, I notice a considerable proportion of genes in my dataset had no taxonomic annotation, it seems not suitable to use HUMAnN3. So I will try other methods to deal with my gene sets. Thanks for your help.