gtonkinhill/panaroo

Update the gene name in results of panaroo by bakta

Closed this issue · 1 comments

Hello, thank you for your work. Panaroo has been very helpful to me! I've encountered some issues while using Panaroo and I'm hoping to get your help.

For me, I always use the GFF files generated by Prokka as input for Panaroo, because Prokka annotates genomes faster compared to Bakta, possibly due to using a smaller database. At this stage, I don't necessarily need more gene names annotated. However, when I need to use other functionalities like Panaroo-gene-neighbourhood or the 'final_graph.gml' file, having too few genes annotated with names is problematic. I end up manually annotating gene names using Bakta_proteins or EggNOG-mapper one by one, which is very time-consuming process for me.

I'd like to ask: would it be feasible to update gene names in Panaroo results using Bakta? (I think this depends on whether both tools predict CDSs consistently). If so, could pan-genome become a tool to improve genome annotation speed? For example, I have hundreds or thousands of genomes of the same species that need to be annotated, and if we use Bakta to annotate them one by one, the speed will be very slow. Therefore, for the annotation of CDS part, this approach could be used to enhance annotation speed: by utilizing the pan-genome to obtain all genes in these genomes, then using Bakta_proteins to annotate these genes, and finally updating the previous GFF file with the annotation results.

Hi,

We don't currently have a tool for this but you might be interested in ggCaller which annotates genes directly on a pangenome graph. It uses Panaroo internally to do the gene clustering.