Annotation error: gene coordinates exceeding contig length
MilanAdd opened this issue · 2 comments
Hello PPanggolin team,
I am trying your tool for the first time on Python 3.9 to build a cyanobacteria pangenome on several different identity thresholds, including 28% here. I tried running this command:
ppanggolin all --fasta GENOMES_FASTA_LIST.tsv --output pangenome_28 --identity 0.28 --cpu 12 -f
It first ignores the negative coordinates generated by Aragorn, which makes sense, but then it spits out this error related to the gene coordinates exceeding the contig length:
2024-07-24 10:58:21 utils.py:l169 INFO Command: /home/milu/anaconda3/envs/ppanggolin/bin/ppanggolin all --fasta GENOMES_FASTA_LIST.tsv --output pangenome_28 --identity 0.28 --cpu 12 -f
2024-07-24 10:58:21 utils.py:l170 INFO PPanGGOLiN version: 2.1.0
2024-07-24 10:58:21 utils.py:l767 INFO 12 parameters have a non-default value.
2024-07-24 10:58:21 annotate.py:l1178 INFO Reading GENOMES_FASTA_LIST.tsv the list of genome files
2024-07-24 10:58:21 annotate.py:l1195 INFO Annotating 2121 genomes using 12 cpus...
0%| | 0/2121 [00:00<?, ?file/s]2024-07-24 10:58:22 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Thr', 'c[-4,68]', '33', '(ggt)'] This RNA is ignored.
4%|█▌ | 86/2121 [00:54<21:34, 1.57file/s]
2024-07-24 10:59:42 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Asn', '[-2,70]', '33', '(gtt)'] This RNA is ignored.
2024-07-24 11:00:48 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Lys', 'c[-1,72]', '34', '(ttt)'] This RNA is ignored.
2024-07-24 11:00:48 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Thr', 'c[-1,71]', '33', '(tgt)'] This RNA is ignored.
2024-07-24 11:00:48 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Phe', 'c[-2,71]', '34', '(gaa)'] This RNA is ignored.
2024-07-24 11:01:34 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-1,73]', '34', '(cat)'] This RNA is ignored.
2024-07-24 11:01:54 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Lys', 'c[-1,72]', '34', '(ttt)'] This RNA is ignored.
2024-07-24 11:02:36 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Leu', '[-2,80]', '34', '(tag)'] This RNA is ignored.
2024-07-24 11:02:56 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Lys', 'c[-1,72]', '34', '(ttt)'] This RNA is ignored.
2024-07-24 11:02:56 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Thr', 'c[-1,71]', '33', '(tgt)'] This RNA is ignored.
2024-07-24 11:03:19 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-2,72]', '35', '(cat)'] This RNA is ignored.
2024-07-24 11:04:42 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Gly', '[-2,70]', '33', '(ccc)'] This RNA is ignored.
2024-07-24 11:05:45 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Asn', 'c[-2,70]', '33', '(gtt)'] This RNA is ignored.
2024-07-24 11:06:04 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Pro', 'c[-3,71]', '35', '(tgg)'] This RNA is ignored.
2024-07-24 11:07:02 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ser', 'c[-1,88]', '36', '(gga)'] This RNA is ignored.
2024-07-24 11:07:28 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ile', 'c[-1,75]', '36', '(gat)'] This RNA is ignored.
2024-07-24 11:08:15 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Cys', 'c[-2,70]', '33', '(gca)'] This RNA is ignored.
2024-07-24 11:08:29 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-2,72]', '35', '(cat)'] This RNA is ignored.
2024-07-24 11:08:29 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ala', 'c[-4,69]', '34', '(tgc)'] This RNA is ignored.
2024-07-24 11:08:39 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ser', 'c[-1,86]', '36', '(tga)'] This RNA is ignored.
2024-07-24 11:08:59 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Trp', '[-2,71]', '34', '(cca)'] This RNA is ignored.
2024-07-24 11:09:01 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-1,73]', '34', '(cat)'] This RNA is ignored.
2024-07-24 11:09:05 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Pro', 'c[-4,70]', '35', '(tgg)'] This RNA is ignored.
2024-07-24 11:10:25 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-1,75]', '36', '(cat)'] This RNA is ignored.
2024-07-24 11:10:40 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ser', 'c[-2,83]', '35', '(tga)'] This RNA is ignored.
2024-07-24 11:11:30 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Arg', '[-2,69]', '32', '(ccg)'] This RNA is ignored.
2024-07-24 11:11:36 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Phe', 'c[-1,72]', '34', '(gaa)'] This RNA is ignored.
2024-07-24 11:13:16 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Asp', 'c[-2,72]', '35', '(gtc)'] This RNA is ignored.
2024-07-24 11:13:41 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Lys', 'c[-1,74]', '35', '(ctt)'] This RNA is ignored.
2024-07-24 11:14:33 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Lys', 'c[-1,74]', '35', '(ttt)'] This RNA is ignored.
2024-07-24 11:15:00 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Leu', '[-2,77]', '34', '(tag)'] This RNA is ignored.
2024-07-24 11:15:56 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', '[-3,74]', '35', '(cat)'] This RNA is ignored.
2024-07-24 11:16:43 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', 'c[-2,70]', '33', '(cat)'] This RNA is ignored.
2024-07-24 11:16:43 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Met', '[-2,70]', '33', '(cat)'] This RNA is ignored.
2024-07-24 11:16:43 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Pro', 'c[-2,72]', '35', '(ggg)'] This RNA is ignored.
2024-07-24 11:17:50 synta.py:l77 WARNING Aragorn gives non valide coordiates for a RNA gene: ['1', 'tRNA-Ser', 'c[-1,88]', '36', '(gga)'] This RNA is ignored.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/annotate/synta.py", line 377, in annotate_organism
gene.add_sequence(get_dna_sequence(contig_sequences[contig.name], gene))
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/annotate/synta.py", line 316, in get_dna_sequence
assert highest_position <= len(
AssertionError: Gene coordinates exceed contig length. gene coordinates [(65755, 65827)] vs contig length 65826
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/milu/anaconda3/envs/ppanggolin/bin/ppanggolin", line 10, in <module>
sys.exit(main())
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/main.py", line 222, in main
ppanggolin.workflow.all.launch(args)
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/workflow/all.py", line 295, in launch
launch_workflow(args, panrgp=True, panmodule=True)
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/workflow/all.py", line 96, in launch_workflow
annotate_pangenome(pangenome, args.fasta, tmpdir=args.tmpdir, cpu=args.annotate.cpu,
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/site-packages/ppanggolin/annotate/annotate.py", line 1207, in annotate_pangenome
pangenome.add_organism(future.result())
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/home/milu/anaconda3/envs/ppanggolin/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
AssertionError: Gene coordinates exceed contig length. gene coordinates [(65755, 65827)] vs contig length 65826
This is the genome set I'm using that I've obtained from GTDB (I filtered out around 200 of these genomes out for my actual set): https://gtdb.ecogenomic.org/advanced?exp=KDEmMiYzKQ~~&1=MX4yfmN5YW5vYmFjdGVyaW90YQ~~&2=NTJ.MTJ.OTk~&3=NTN.MTB.Mw~~
I'm aware that this assertion error is built into the annotation script for certain edges cases, which mine probably is for some reason. Any ideas or suggestions on how this can be dealt with?
Thank you!
Hi,
I was able to reproduce the issue on my end. Thanks for the clear indication.
It looks like the problem comes from Aragorn giving gene coordinates that go beyond the contig length. We knew it sometimes gives negative coordinates, and we handle those cases by throwing a warning and ignoring the gene, as you noticed in your log.
However, we didn't anticipate it could also give coordinates that exceed the contig length. We'll fix this by identifying these cases and throwing a similar warning. I'll work on patching that very soon.
Thanks for reporting this issue !
Best,
Hi,
This bug has been fixed and is now included in version 2.1.1 .
Best