
[BUG] Wrong coordinates in results

Closed this issue · 13 comments

Describe the bug
First thanks for the nice work on this tool. I have been using this tool in a pipeline of mine and it has been working awesomely.

Recently I tried using it with some vibrio genomes, and it has been showing problems with the annotation of integrons that happen in the very start of the sequences.

If many fails because it has sometimes generated results at the very first base and writing it as 0-index for example. And in some other, it has generated wrong negative start positions as below:

13      Integron_Finder integron        69515   74987   .       +       1       ID=integron_01;integron_type=complete
24      Integron_Finder integron        25      12675   .       +       1       ID=integron_01;integron_type=CALIN
25      Integron_Finder integron        19      9958    .       +       1       ID=integron_01;integron_type=CALIN
27      Integron_Finder integron        6936    9536    .       +       1       ID=integron_01;integron_type=complete
31      Integron_Finder integron        478     4564    .       +       1       ID=integron_01;integron_type=CALIN
32      Integron_Finder integron        66      4604    .       +       1       ID=integron_01;integron_type=CALIN
33      Integron_Finder integron        117     4047    .       +       1       ID=integron_01;integron_type=CALIN
37      Integron_Finder integron        -2      3108    .       +       1       ID=integron_01;integron_type=CALIN
38      Integron_Finder integron        2       2804    .       +       1       ID=integron_01;integron_type=CALIN
44      Integron_Finder integron        70      1709    .       +       1       ID=integron_01;integron_type=CALIN
46      Integron_Finder integron        -17     1603    .       +       1       ID=integron_01;integron_type=CALIN

I am thus, sharing the gbk files that were generated by integron_finder itself during analysis so that you can see the generated results, while at the same time having the contig sequence for reproducing it.

To Reproduce

integron_finder --local-max --func-annot --pdf --gbk --cpu 4 vibrio31.fna

Expected behavior

The minimum allowed starting base should be 1, not 0 nor negative.


  • Linux
  • Windows
  • Mac

Integron_Finder Version:

version 2.0.1


could you share vibrio31.fna ?


Hello hello,
The two problematic contigs shared in the two genbank files (output of integron finder) in the zip file are not sufficient?
I am not sure I can share the whole genome ( I can ask if not sufficient ).

Here is the fna file of the genome, containing the two contigs ( 37 and 46 ).

Ah ok, I found the bug, it's because there is a hit on the very first position but the attC model is truncated. And when a model is truncated, we corrected the position, such that the real start of the attC site starts a bit before.

The bug is around L95 in I think.

I don't have much time to fix that now, feel free to propose a PR if you can. Otherwise, me or @bneron might try to fix that when we can.


bneron commented

I'm going to work on it

bneron commented

If I understand the problem, the position should be 0 in this case, isn't it?

Actually, I believe should be 1.

I believe genbank and gff files are 1-index based.

yes, and we should also check for the same case where the attC model is truncated at the end of a contig (not only at the start as in this issue).

bneron commented

@jeanrjc could you check the fix I just made

df.loc[idx, "pos_beg"] = df.loc[idx].apply(lambda x: max(x["pos_end_tmp"] - (len_model_attc - x["cm_fin"]),

@jeanrjc could you check the fix I just made

df.loc[idx, "pos_beg"] = df.loc[idx].apply(lambda x: max(x["pos_end_tmp"] - (len_model_attc - x["cm_fin"]),

It works for me ! Thanks

is this merged @bneron ?

Hello, I am getting the same error with IntegronFinder v2.0.2. I can see the bug was fixed but It is not yet in the current release. Could you guys please add this fix to main?

fixed in integron_finder 2.0.5 version