leylabmpi/Struo2

need gene_db example for update HUMAnN3 with new gene set input

Closed this issue · 3 comments

Hi, thanks for your work on Struo2!
I have a set of MAGs and I want to use Struo2 to make a new Kraken database with genome sequence, and update the current HUMAnN3 with gene set (which I get by running prodigal).

I had successfully gotten the new Kraken database (based on your great work!), while when I use the config-update.yaml as a template to run the HUMAnN3 update pipeline, I got the error X For user-provided gene sequences you must update the genes database! which means I need update genes at the same time.

But updating the genes database needs:

genes_db:
  genes:
    mmseqs_db:  tests/output_GTDBr95_n10/genes/genes_db.tar.gz
    amino_acid: tests/output_GTDBr95_n10/genes/genome_reps_filtered.faa.gz
    nucleotide: tests/output_GTDBr95_n10/genes/genome_reps_filtered.fna.gz
    metadata:   tests/output_GTDBr95_n10/genes/genome_reps_filtered.txt.gz
  cluster:
    mmseqs_db:  tests/output_GTDBr95_n10/genes/cluster/clusters_db.tar.gz    

I have the following questions hope you can help me:

  1. I use HUMAnN3 uniref90 database, is there any way to get the amino_acid, nucleotide and metadata of this database? If not, could you provide examples of those data?
  2. Like mmseqs takes a long time to re-index the database, I prefer to use diamond in HUMAnN3 updating, can I skip mmseqs_db in the gene updating step?
  3. All in all, my goal is to update the HUMAnN3 database with personal data, I will glad if you had a recommended way or other advice to do it.

Regards,
Yiqi

I’m on vacation this week, but I’ll have a look at the problem ASAP

  1. You'd have to ask the HUMAnN3 developers
  2. Only mmseqs can iteratively update a gene cluster database, which is why it is used versus diamond
  3. See https://github.com/leylabmpi/Struo2/wiki/Database-updating-tutorial:-adding-genes

Thanks for your reply!
I think merging my gene set to your pre-processed GTDB database may be a better choice.
However, I notice a considerable proportion of genes in my dataset had no taxonomic annotation, it seems not suitable to use HUMAnN3. So I will try other methods to deal with my gene sets. Thanks for your help.