ogotoh/spaln

Segmentation fault in command line parsing

Opened this issue · 7 comments

Hi,
I realised that spaln was moved to github and thus I wanted to release an updated Debian package. We have created a CI test for version 2.4.1, which basically calls:

perl makeidx.pl -v -inp dictdisc_g.gf.gz
spaln -Q7 -d dictdisc_g -T dictdisc dictdisc.faa.gz
spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz
sortgrcd -O15 -F2 dictdisc.grd.gz

I realised that I need to apply this patch since it can not assumed that the current directory is inside PATH. (BTW, it would probably be better to not ship the same code twice.)

That way the test is progressing, however the line

spaln -Q7 -d dictdisc_g -yS -T dictdisc -O12 -g dictdisc.cf.gz

segfaults and the actual code line is marked in this debug patch. I guess there is somewhere an issue in the command line parsing code which just crashes here. You can see a full build log to see how spaln was build and here you find the call that leads to segfault.

Kind regards, Andreas.

Dear Andreas,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Osamu,

Dear Osamu,

Thank you very much for your reports. I have just uploaded a bug-fix version (ver.2.4.7) to the GitHub site.

Ahhh, I just realise now that version 2.4.11 is older than 2.4.2 etc. Our automatic download tool has fetched this one as last release. Now I have tweaked it to consider this 2.4.1.1 (I recommend separating even micro and nano version by a '.' for clarification.)

Compared with ver.2.4.1, the latest version shows not dramatically but significantly better performance in speed and mappability.

Sounds good!

Use of “makeidx.pl -inp input” is somewhat obsolete. Use of “spaln -W -K[A|D|P] input” would be more convenient in most cases.

I do not insist in using makeidx.pl - I simply continued to use the test that was once invented for a certain version. In other words: I'd be super happy if you suggest a better way to test than what we are using in our CI test. Feel free to suggest a total rewrite of what you can read starting at line 18 in this file.

The meaning of -yS option has slightly changed from the original one; species-specific parameter set will be used whenever -T option is specified. So, you usually do not need to set this option (that is why I could not notice the bug).

Thanks a lot for the quick fix.

Unfortunately the test we used to work keeps on failing Any hint how to rewrite that test sensibly would be welcome.
Kind regards, Andreas.

Dear Andreas,

Change the last line of CItest script from
sortgrcd -O15 -F2 dictdisc.grd.gz
to
sortgrcd -O15 -F2 dictdisc.cf.grd.gz

and then test again.

Osamu,

Dear Andreas,

Thank you again for your continued efforts.

I admit that the error message is not much informative. I will change it in the next release.

As mentioned earlier, I prefer to use “spaln -W” rather than to use “makeidx.pl”, because the latter no longer depends on “make” so that it runs in any directory without any care about the location of “Makefile”. One minor disadvantage is that we must run spaln twice if we want to format the genomic sequence for both DNA and protein queries. Thus, the recommended alternative to Line 18 would be:

spaln -W -KD [-t4] dictdisc_g.gf.gz
spaln -W -KP [-t4] dictdisc_g.gf.gz

The optional -t4 option would approximately halve the calculation time, if your system supports up to four-fold multi-thread operation. (On my PC with 8 cores, 4 was the best; more than 5 did not further accelerate calculation).

Likewise, -tN option would be useful in the map-and-search mode. In this mode, the acceleration rate is nearly linier with respect to N.

Osamu,

Dear Andreas,

Thanks. I have an additional comment. The -t${NCPUS} option would be more beneficial in the map/align mode rather than in the format mode.

spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} dictdisc.faa.gz
spaln -Q7 -d dictdisc_g -T dictdisc -t${NCPUS} -O12 -g dictdisc.cf.gz

Osamu,