mozack/abra

Bug in BED file parsing

Opened this issue · 0 comments

Hi Lisle,

I found a little bug in your BED file parsing:
Header lines without tab are treated as data lines.
Thus, they lead to a crash because the second element is accessed but not present.
I guess you should only consider lines that have three or more elements after splitting.

Here the command and output:

java -cp abra.jar abra.KmerSizeEvaluator 100 hg19.fa /tmp/test 1 test.bed
Loading reference map: /tmp/local_ngs_data/hg19.fa
Chromosome: chrM length: 16571
Chromosome: chr1 length: 249250621
...
Done loading ref map. Elapsed secs: 179
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at abra.RegionLoader.load(RegionLoader.java:45)
at abra.ReAligner.getRegions(ReAligner.java:702)
at abra.KmerSizeEvaluator.run(KmerSizeEvaluator.java:50)
at abra.KmerSizeEvaluator.main(KmerSizeEvaluator.java:240)

And this is the BED file:

cat test.bed
browser position chr7:127471196-127495720
browser hide all
track name="ItemRGBDemo" description="Item RGB demonstration" visibility=2
chr7 127471196 127472363 Pos1 0 + 127471196 127472363 255,0,0
chr7 127472363 127473530 Pos2 0 + 127472363 127473530 255,0,0
chr7 127473530 127474697 Pos3 0 + 127473530 127474697 255,0,0
chr7 127474697 127475864 Pos4 0 + 127474697 127475864 255,0,0
chr7 127475864 127477031 Neg1 0 - 127475864 127477031 0,0,255
chr7 127477031 127478198 Neg2 0 - 127477031 127478198 0,0,255
chr7 127478198 127479365 Neg3 0 - 127478198 127479365 0,0,255
chr7 127479365 127480532 Pos5 0 + 127479365 127480532 255,0,0
chr7 127480532 127481699 Neg4 0 - 127480532 127481699 0,0,255

Best regards,
Marc