gtonkinhill/panaroo

Example of missing AMR gene allele

Danderson123 opened this issue · 5 comments

Hi @gtonkinhill & @nzmacalasdair,

I am using Panaroo v1.4.0 as part of a larger pipeline to make truth GFFs for 24 hybrid E. coli assemblies to evaluate an AMR genotyping tool. I annotated my assemblies using Bakta, then ran panaroo to unify the gene identifiers. While running my evaluation, I noticed the sample GCA_027944735.1_ASM2794473v1_genomic was missing from the genomeIDs of sul2 in the Panaroo graph, but according to both the bakta gff and Panaroo gene_data.csv, the gene is present in the sample with clustering ID 8_3_34 and annotation ID JJAHDI_27045. My understanding is that --remove-invalid-genes only removes annotations not divisible by 3 or containing premature stop codons, so this sequence looks ok. Any idea why this may be the case? Here are dropbox links to the gzipped bakta GFFs for my test samples and the Panaroo output. The command I ran is below. Let me know if you need my package versions.

GFFs: https://www.dropbox.com/scl/fi/g17vde4m60qxmuxb6b88l/bakta_gffs.tar.gz?rlkey=g1kxgsdjrppjte6yv9h00jjcm&dl=0
Panaroo output: https://www.dropbox.com/scl/fi/312s45vvxu8br15b21bql/panaroo_output.tar.gz?rlkey=6zeppa2974ei4xoumy9041h1f&dl=0

Command:
panaroo --refind_strict --clean-mode sensitive --remove-invalid-genes -c 0.9 --len_dif_percent 0.9 --length_outlier_support_proportion 0.1 --merge_paralogs -i panaroo_input.TXT -o panaroo_output --threads 24

Hi @Danderson123 ,

Thanks very much for raising this and providing the example data. I'll try and take a look this week.
You're correct, the --remove-invalid-genes shouldn't have filtered this gene and from a quick initial look it seems that something else might be going on. I'll try and get back to you quickly.

Hi @Danderson123

It looks like this was a bug in the recently introduced --refind_strict method. If a possible (but invalid) refound gene overlapped with an existing annotation both annotations were being incorrectly removed when the flag was enabled.

This should hopefully be fixed in the devel branch. I will add in a couple of other changes and create a new release. In the meantime you should be able to test out the fix by reinstalling panaroo using

pip install git+https://github.com/gtonkinhill/panaroo@devel

Thanks very much for pointing out this issue and providing the test example. I really appreciate it!

Hey @gtonkinhill , Dan is on holiday this week, so won't respond til next week. Thanks for the bug fix!

Hi @gtonkinhill,

Can confirm this resolved the issue, thank you for fixing it so quickly!

Thanks for checking it worked, the latest release should include the fix!