leylabmpi/Struo2

Pangenomes with Struo2

sarpiens opened this issue · 2 comments

Hello! Out of curiosity, in the description of the pipeline you say that "Struo2 is designed to process one genome representative of each species, which is all that is required for generating the Kraken2/Bracken and HUMAnN databases." I was wondering, If instead of providing reference genomes, pangenomes were provided such as those generated by Panaroo (genes from all the genomes of the species):

  1. Would this be a problem for the taxonomic databases (kraken+bracken), or for the way Struo2 works?

  2. I imagine that for the functional database it would be ideal to provide the pangenome, because in the end it is the genes that are of interest. But I was wondering in general, how much of an improvement would it be to provide the pangenomes, or would the reference genomes be enough in most cases?

Thanks in advance.

Including mainly closely related genomes results in many multiple mappings, which lowers the classification rate. So, I don't recommend using a pangenome.

Okey, thanks for your quick response