Crash when target region BED file contains header lines
Opened this issue · 3 comments
Hi,
I found a small bug when using a BED with with browser/track header for k-mer size calculation, e.g. this file:
browser position chr1:12081-12251
track name="Covered" description="Agilent SureSelect DNA - SureSelect Clinical Research Exome V2 -Genomic regions covered by probes" color=0,0,128
chr1 12080 12251
chr1 12595 12802
chr1 13163 13658
I then tried to calcualte k-mer sizes:
java -Xmx16G -cp /mnt/share/opt/abra2_2.05/abra2-2.05.jar abra.KmerSizeEvaluator ... [bed file]
It then crashes when trying to access the second tab-separated entry of the first line, which does not exists:
INFO Thu Nov 09 14:47:19 CET 2017 Loading reference map: /tmp/local_ngs_data//GRCh37.fa
INFO Thu Nov 09 14:49:58 CET 2017 Done loading ref map. Elapsed secs: 158
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at abra.RegionLoader.load(RegionLoader.java:49)
at abra.ReAligner.getRegions(ReAligner.java:839)
at abra.KmerSizeEvaluator.run(KmerSizeEvaluator.java:50)
at abra.KmerSizeEvaluator.main(KmerSizeEvaluator.java:240)`
This is not a big problem since one can remove the lines, but I think ABRA should handle this more gracefully, e.g. by ignoring all lines with less than three tab-separated parts.
Best,
Marc
"browser" and "track" lines are ignored in v2.12. Please retry with the latest release.
Also, depending on your dataset, you may not see much of a speed improvement from using pre-generated kmer sizes in ABRA2. You may be able to just pass in the original bed file without much of a performance hit.
Yes, I can confirm that it works with version 2.12.
We do mainly exomes. Is your suggestion for skipping the kmer calculation valid there too?
Best,
Marc
Yes. To be clear I am suggesting using --targets with your original bed file instead of --target-kmers. You may wish to test with and without to confirm there is little difference in speed. Using --target-kmers certainly should not hurt performance though.