Update the gene name in results of panaroo by bakta
Closed this issue · 1 comments
Hello, thank you for your work. Panaroo
has been very helpful to me! I've encountered some issues while using Panaroo
and I'm hoping to get your help.
For me, I always use the GFF files generated by Prokka
as input for Panaroo
, because Prokka
annotates genomes faster compared to Bakta
, possibly due to using a smaller database. At this stage, I don't necessarily need more gene names annotated. However, when I need to use other functionalities like Panaroo-gene-neighbourhood
or the 'final_graph.gml' file, having too few genes annotated with names is problematic. I end up manually annotating gene names using Bakta_proteins
or EggNOG-mapper
one by one, which is very time-consuming process for me.
I'd like to ask: would it be feasible to update gene names in Panaroo
results using Bakta
? (I think this depends on whether both tools predict CDSs consistently). If so, could pan-genome become a tool to improve genome annotation speed? For example, I have hundreds or thousands of genomes of the same species that need to be annotated, and if we use Bakta
to annotate them one by one, the speed will be very slow. Therefore, for the annotation of CDS part, this approach could be used to enhance annotation speed: by utilizing the pan-genome to obtain all genes in these genomes, then using Bakta_proteins
to annotate these genes, and finally updating the previous GFF file with the annotation results.
Hi,
We don't currently have a tool for this but you might be interested in ggCaller which annotates genes directly on a pangenome graph. It uses Panaroo internally to do the gene clustering.