mozack/abra2

java.lang.OutOfMemoryError: Java heap space

AdilyAid opened this issue · 1 comments

Hello,

I run abra2.22 as part of cwl pipeline of nucleo.
abra2 runs for few days and than crashed:

INFO	Fri Sep 09 09:01:34 UTC 2022	PROCESS_REGION_MSECS:	chr9_5391847_5392563	802613	0	1834	0
INFO	Fri Sep 09 09:01:40 UTC 2022	PROCESS_REGION_MSECS:	chr9_5394500_5394751	27	0	6	0
INFO	Fri Sep 09 09:01:44 UTC 2022	PROCESS_REGION_MSECS:	chr9_5395989_5396107	21	0	0	0
INFO	Fri Sep 09 09:01:53 UTC 2022	PROCESS_REGION_MSECS:	chr9_5402364_5402613	8967	0	1	0
INFO	Fri Sep 09 09:01:58 UTC 2022	PROCESS_REGION_MSECS:	chr9_5403925_5404129	5748	0	1	0
INFO	Fri Sep 09 09:03:56 UTC 2022	PROCESS_REGION_MSECS:	chr9_5404631_5405449	117556	5	109	0
INFO	Fri Sep 09 09:04:10 UTC 2022	PROCESS_REGION_MSECS:	chr9_5405647_5406088	14397	1	19	0
INFO	Fri Sep 09 09:04:47 UTC 2022	PROCESS_REGION_MSECS:	chr9_5406273_5406759	36378	0	44	0
INFO	Fri Sep 09 09:04:52 UTC 2022	PROCESS_REGION_MSECS:	chr9_5414156_5414311	5786	0	4	0
INFO	Fri Sep 09 09:04:59 UTC 2022	PROCESS_REGION_MSECS:	chr9_5418517_5418643	6388	0	2	0
INFO	Fri Sep 09 09:04:59 UTC 2022	PROCESS_REGION_MSECS:	chr9_5418888_5419047	27	0	3	0
INFO	Fri Sep 09 09:20:44 UTC 2022	chr10:43114788 : 	Curr reads size: 575009
INFO	Fri Sep 09 10:00:55 UTC 2022	chr1:156874308 : 	Curr reads size: 554415
INFO	Fri Sep 09 10:25:22 UTC 2022	chr4:1808187 : 	Curr reads size: 408300
INFO	Fri Sep 09 11:20:25 UTC 2022	chr6:117321312 : 	Curr reads size: 618944
INFO	Fri Sep 09 12:42:03 UTC 2022	PROCESS_REGION_MSECS:	chr9_5420197_5421057	1331503	1	2437	0
INFO	Fri Sep 09 12:42:08 UTC 2022	PROCESS_REGION_MSECS:	chr9_5421423_5421585	5741	0	1	0
INFO	Fri Sep 09 12:42:14 UTC 2022	PROCESS_REGION_MSECS:	chr9_5422622_5422717	5211	0	1	0
INFO	Fri Sep 09 12:52:30 UTC 2022	chr1:156874311 : 	Curr reads size: 556194
INFO	Fri Sep 09 15:12:08 UTC 2022	chr1:156874314 : 	Curr reads size: 558110
INFO	Fri Sep 09 17:33:53 UTC 2022	chr6:117321315 : 	Curr reads size: 619345
INFO	Fri Sep 09 18:16:03 UTC 2022	PROCESS_REGION_MSECS:	chr2_29222739_29223539	3495717	1	3989	0
INFO	Fri Sep 09 19:09:00 UTC 2022	chr1:156874318 : 	Curr reads size: 559856
INFO	Fri Sep 09 19:16:38 UTC 2022	chr14:104773477 : 	Curr reads size: 542026
java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.<init>(String.java:325)
	at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
	at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:288)
	at htsjdk.samtools.BinaryTagCodec.readNullTerminatedString(BinaryTagCodec.java:423)
	at htsjdk.samtools.BinaryTagCodec.readSingleValue(BinaryTagCodec.java:318)
	at htsjdk.samtools.BinaryTagCodec.readTags(BinaryTagCodec.java:282)
	at htsjdk.samtools.BAMRecord.decodeAttributes(BAMRecord.java:313)
	at htsjdk.samtools.BAMRecord.getAttribute(BAMRecord.java:293)
	at htsjdk.samtools.SAMRecord.getAttribute(SAMRecord.java:1110)
	at htsjdk.samtools.SAMRecord.getStringAttribute(SAMRecord.java:1220)
	at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:104)
	at abra.ReAligner.remapReads(ReAligner.java:779)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:424)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.OutOfMemoryError: Java heap space
	at abra.SimpleMapper.getPositionMismatches(SimpleMapper.java:80)
	at abra.SimpleMapper.map(SimpleMapper.java:157)
	at abra.ReadEvaluator.getImprovedAlignment(ReadEvaluator.java:70)
	at abra.ReadEvaluator.getImprovedAlignment(ReadEvaluator.java:34)
	at abra.ReAligner.remapRead(ReAligner.java:592)
	at abra.ReAligner.remapReads(ReAligner.java:771)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:424)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
INFO [job abra2_2_22] Max memory used: 24762MiB

When I it run on a small input it finished successfully. But for bigger fastq files - 6 GB - it was very slow and didn't finished.
The command is:
/usr/src/abra2-2.22.jar --threads 16 --tmpdir /tmp --cons --ca 10,1 --in gatk_uncollapsed_MD.bam --mad 1000 --mmr 0.1 --no-edge-ci --nosort --out UBG_abra2_uncollapsed_IR.bam --ref hg38.fasta --sga 8,32,48,1 --sc 100,30,80,15 --targets gatk_uncollapsed_MD.bed --ws 800,700

What can be the reason for using so many memory and why each step by interval took so long?

Thank you in advance for your help,
Adily

You seem to be missing the -Xmx parameter in your command, try -Xmx32g.

See this example for 16g

java -Xmx16G -jar abra2.jar --in normal.bam,tumor.bam --out normal.abra.bam,tumor.abra.bam --ref hg38.fa --threads 8 --targets targets.bed --tmpdir /your/tmpdir > abra.log