[BUG] Wrong coordinates in results
Closed this issue · 13 comments
Describe the bug
Hi,
First thanks for the nice work on this tool. I have been using this tool in a pipeline of mine and it has been working awesomely.
Recently I tried using it with some vibrio
genomes, and it has been showing problems with the annotation of integrons that happen in the very start of the sequences.
If many fails because it has sometimes generated results at the very first base and writing it as 0-index for example. And in some other, it has generated wrong negative start positions as below:
13 Integron_Finder integron 69515 74987 . + 1 ID=integron_01;integron_type=complete
24 Integron_Finder integron 25 12675 . + 1 ID=integron_01;integron_type=CALIN
25 Integron_Finder integron 19 9958 . + 1 ID=integron_01;integron_type=CALIN
27 Integron_Finder integron 6936 9536 . + 1 ID=integron_01;integron_type=complete
31 Integron_Finder integron 478 4564 . + 1 ID=integron_01;integron_type=CALIN
32 Integron_Finder integron 66 4604 . + 1 ID=integron_01;integron_type=CALIN
33 Integron_Finder integron 117 4047 . + 1 ID=integron_01;integron_type=CALIN
37 Integron_Finder integron -2 3108 . + 1 ID=integron_01;integron_type=CALIN
38 Integron_Finder integron 2 2804 . + 1 ID=integron_01;integron_type=CALIN
44 Integron_Finder integron 70 1709 . + 1 ID=integron_01;integron_type=CALIN
46 Integron_Finder integron -17 1603 . + 1 ID=integron_01;integron_type=CALIN
I am thus, sharing the gbk files that were generated by integron_finder
itself during analysis so that you can see the generated results, while at the same time having the contig sequence for reproducing it.
To Reproduce
integron_finder --local-max --func-annot --pdf --gbk --cpu 4 vibrio31.fna
Expected behavior
The minimum allowed starting base should be 1, not 0 nor negative.
OS:
- Linux
- Windows
- Mac
Integron_Finder Version:
version 2.0.1
Hello,
could you share vibrio31.fna ?
Thanks
Hello hello,
The two problematic contigs shared in the two genbank files (output of integron finder) in the zip file are not sufficient?
I am not sure I can share the whole genome ( I can ask if not sufficient ).
Cheers.
Here is the fna
file of the genome, containing the two contigs ( 37 and 46 ).
vibrio31_subset.fna.gz
Ah ok, I found the bug, it's because there is a hit on the very first position but the attC model is truncated. And when a model is truncated, we corrected the position, such that the real start of the attC site starts a bit before.
The bug is around L95 in infernal.py I think.
I don't have much time to fix that now, feel free to propose a PR if you can. Otherwise, me or @bneron might try to fix that when we can.
Best
I'm going to work on it
If I understand the problem, the position should be 0 in this case, isn't it?
Actually, I believe should be 1.
I believe genbank and gff files are 1-index based.
yes, and we should also check for the same case where the attC model is truncated at the end of a contig (not only at the start as in this issue).
Hello, I am getting the same error with IntegronFinder v2.0.2. I can see the bug was fixed but It is not yet in the current release. Could you guys please add this fix to main?
fixed in integron_finder 2.0.5 version