crash with java IndexOutOfBoundsException

Question

crash with java IndexOutOfBoundsException

wongs2 opened this issue 7 years ago · 11 comments

Tried several input configuration but keep getting stuck at this error:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at abra.AltContigGenerator.getAltContigs(AltContigGenerator.java:273)
at abra.ReAligner.processRegion(ReAligner.java:1222)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:339)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Answer 1 · 2017-09-14T20:13:50.000Z

Please provide a bit more info about your dataset and email the full log to lmose at unc dot edu.

If you're able to share a small bam file that reproduces the issue, that would be helpful.

Answer 2 · 2017-09-14T20:51:41.000Z

This is run as follows, for human genome build 38 + alt etc., whole genome DNA paired reads 150bp at 30x coverage. Input is reads mapped to chr1 and target is exon region in chr1. Will fill in more info when rerun with full logging information. Mapping is done with BWA mem.

java -Xmx10G -jar abra2.jar --in chr1.bam --out chr1-abra.bam --ref hs38DH.fasta --targets exon.bed --tmpdir tmp --log error --threads 6 > abra.log

Answer 3 · 2017-09-15T04:10:30.000Z

Did some investigation with --log debug. After several testing, it seems that the fault lies with this segment in chr1 that is included in the target file. Removing this segment from the target file runs without the error.

chr1:30519165-30519566
TGATGATGATGGAGAGGATGCTGATGGGAAAGATGATGATGATGGAAAAGATGAGGAGGA
TGGTGATGATGAACAGGATAATGATGACGATAATGATGGAGAGGATGATGATGATGATGG
TGGTGATGGAGAGAATGATGACAAGGATGGGGATAATGGTGATGATGATGGTGGAGAAGA
TGATGATAAAGAGGATGATGATGGAGAGAATGATGATGAAGGAGAGAATGATGATGAACA
TGATGATGGAGATAATGATGATGGAAAGGATGATGATGGAGGTGATGATGACAGAGAGGA
TGATGACGATGATGATGGAAAAAATGATGATGATGAAGAAGATACTGATTATGGGGAAGA
TTATGATGATGGAGAGGATTATGATGGAGAAAATGATGTGAT

Answer 4 · 2017-09-15T16:10:16.000Z

Thanks for investigating. Are you able to share a BAM file containing the reads that overlap that region?

Answer 5 · 2017-09-15T17:51:10.000Z

Attached is the bam file for the region. Note that running this directly with abra2 will throw error "Inapproriate call if not paired read" as the pair read is not included. Thanks! Also do you have a speed test comparing ABRA and ABRA2, on the WGS DNA I dont think there is speed up, in fact it seem slower at first glance. I might be mistaken. Thanks once again for the great work!. Matt

…

On Fri, Sep 15, 2017 at 12:14 PM, Lisle Mose ***@***.***> wrote: Thanks for investigating. Are you able to share a BAM file containing the reads the overlap that region? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AdaXcRusssCl3mSBVKNJcfgfWrKs37-lks5siqFpgaJpZM4PYHe9> .

Answer 6 · 2017-09-15T23:24:20.000Z

Thanks for sending. Unfortunately, I could not reproduce the issue here.

Regarding speed, we definitely see a speedup with ABRA2. In our most recent exome test for example, the timings are:

abra: 6922 seconds
abra2: 2861 seconds

The original ABRA implementation could not scale to WGS (with realignments happening over the entire genome).

ABRA2 also sorts the final output. For cases where the fraction of reads being realigned is much smaller than the total number of reads, I could see ABRA2 potentially running slower than ABRA because of this. Running only exonic regions against WGS may fit this category. You'd need to try running ABRA2 with the --nosort option to get an apples to apples comparison.

Lastly, ABRA2 parallelizes in 25 megabase chunks. ABRA's parallelization was much more fine grained. If you're processing only a single chromosome, the original ABRA may achieve better parallelization.

Answer 7 · 2017-09-16T15:50:29.000Z

Thanks. Will try out the new release 2.09.

One question is I am parallelizing the compute for WGS by performing ABRA2 on individual chr bam file. There will be discordant reads (the other end of paired reads mapped to other chr or unmapped) as such whose other end of read is absent from the bam file. Will this be an issue for ABRA2.

Answer 8 · 2017-09-16T19:45:18.000Z

OK. How are you splitting the BAM files? Also, how did you generate the small BAM file you emailed?

Answer 9 · 2017-09-16T21:10:56.000Z

Using sambamba view with chr or the chr with range position.

I was conveniently using a bam file previously proceeded by ABRA as input to ABRA2 to test for the above. This might have something to do with the error although the reason is unclear to me.

Testing with another older bam direct output from BWA seem to work fine. Yes, the speed up is significant. Even targeting whole chr takes decent time. Bravo!

Answer 10 · 2017-09-17T17:34:29.000Z

Thanks for the feedback. FYI, a single read in the BAM file you sent had the read paired flag unset. I do not not know if sambamba view alters the bit flags or not. Samtools view does not. At present, all reads must be paired in order for ABRA2 to work properly. As long is the reads are not modified, I do not see a problem with processing by chromosome. I have yet to test this myself however.

Answer 11 · 2017-09-28T15:21:33.000Z

Closing. Please re-open if you still see an issue.