mozack/abra2

Crash when target region BED file contains header lines

Opened this issue · 3 comments

Hi,

I found a small bug when using a BED with with browser/track header for k-mer size calculation, e.g. this file:

browser position chr1:12081-12251
track name="Covered" description="Agilent SureSelect DNA - SureSelect Clinical Research Exome V2 -Genomic regions covered by probes" color=0,0,128
chr1 12080 12251
chr1 12595 12802
chr1 13163 13658

I then tried to calcualte k-mer sizes:

java -Xmx16G -cp /mnt/share/opt/abra2_2.05/abra2-2.05.jar abra.KmerSizeEvaluator ... [bed file]

It then crashes when trying to access the second tab-separated entry of the first line, which does not exists:

INFO Thu Nov 09 14:47:19 CET 2017 Loading reference map: /tmp/local_ngs_data//GRCh37.fa
INFO Thu Nov 09 14:49:58 CET 2017 Done loading ref map. Elapsed secs: 158
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at abra.RegionLoader.load(RegionLoader.java:49)
at abra.ReAligner.getRegions(ReAligner.java:839)
at abra.KmerSizeEvaluator.run(KmerSizeEvaluator.java:50)
at abra.KmerSizeEvaluator.main(KmerSizeEvaluator.java:240)`

This is not a big problem since one can remove the lines, but I think ABRA should handle this more gracefully, e.g. by ignoring all lines with less than three tab-separated parts.

Best,
Marc

"browser" and "track" lines are ignored in v2.12. Please retry with the latest release.

Also, depending on your dataset, you may not see much of a speed improvement from using pre-generated kmer sizes in ABRA2. You may be able to just pass in the original bed file without much of a performance hit.

Yes, I can confirm that it works with version 2.12.

We do mainly exomes. Is your suggestion for skipping the kmer calculation valid there too?

Best,
Marc

Yes. To be clear I am suggesting using --targets with your original bed file instead of --target-kmers. You may wish to test with and without to confirm there is little difference in speed. Using --target-kmers certainly should not hurt performance though.