mikessh/mageri

Java Out of Memory Error

Khillo81 opened this issue · 1 comments

Hi,

I just started Mageri a few days ago to analyze some Illumina TruSeq Amplicon data from a custom panel with UMIs; however, I'm running into the following error a few minutes into the analysis:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.atomic.AtomicLongArray.<init>(AtomicLongArray.java:81) at com.antigenomics.mageri.core.mapping.QualitySumMatrix.<init>(QualitySumMatrix.java:35) at com.antigenomics.mageri.core.mapping.MutationsTable.<init>(MutationsTable.java:45) at com.antigenomics.mageri.core.mapping.ConsensusAligner.<init>(ConsensusAligner.java:60) at com.antigenomics.mageri.core.mapping.PConsensusAligner.<init>(PConsensusAligner.java:37) at com.antigenomics.mageri.core.mapping.PConsensusAlignerFactory.create(PConsensusAlignerFactory.java:34) at com.antigenomics.mageri.pipeline.analysis.PipelineConsensusAlignerFactory.create(PipelineConsensusAlignerFactory.java:53) at com.antigenomics.mageri.pipeline.analysis.ProjectAnalysis.run(ProjectAnalysis.java:107) at com.antigenomics.mageri.pipeline.Mageri.main(Mageri.java:129)

This occurs directly after building the UMI Index:

[Mon Aug 28 10:34:57 CEST 2017 +00m00s] [UMI_MAGERI_Test] Started analysis. [Mon Aug 28 10:34:57 CEST 2017 +00m00s] [UMI_MAGERI_Test] Pre-processing sample group MG357A. [Mon Aug 28 10:34:57 CEST 2017 +00m00s] [Indexer] Building UMI index, 0 reads processed, 0.0% extracted.. [Mon Aug 28 10:35:07 CEST 2017 +00m10s] [Indexer] Building UMI index, 588288 reads processed, 100.0% extracted.. [Mon Aug 28 10:35:17 CEST 2017 +00m20s] [Indexer] Building UMI index, 873247 reads processed, 100.0% extracted.. [Mon Aug 28 10:35:27 CEST 2017 +00m30s] [Indexer] Building UMI index, 1516289 reads processed, 100.0% extracted.. [Mon Aug 28 10:35:37 CEST 2017 +00m40s] [Indexer] Building UMI index, 2133652 reads processed, 100.0% extracted.. [Mon Aug 28 10:35:47 CEST 2017 +00m50s] [Indexer] Building UMI index, 2735621 reads processed, 100.0% extracted.. [Mon Aug 28 10:35:57 CEST 2017 +01m00s] [Indexer] Building UMI index, 3338909 reads processed, 100.0% extracted.. [Mon Aug 28 10:36:20 CEST 2017 +01m23s] [Indexer] Building UMI index, 3419451 reads processed, 100.0% extracted.. [Mon Aug 28 10:36:23 CEST 2017 +01m26s] [Indexer] Finished building UMI index, 3613540 reads processed, 100.0% extracted [Mon Aug 28 10:36:23 CEST 2017 +01m26s] [UMI_MAGERI_Test] Running analysis for sample group MG357A.

The command I'm running is:

java -Xmx100G -jar mageri.jar -R1 forward.fastq.gz -R2 reverse.fastq.gz --sample-name MG357 -O /path/to/output -M3 NNNNNN --references Homo_sapiens_assembly19.fasta --bed mybed.bed

I've tried to run with 32 GB first and then kept on going up until I reached the limit of my machine. I'm providing the Ensembl hg19 draft of the human genome as a fasta file as I want to avoid having to restrict my mapping to the panel targets.

Hello,
MAGERI requires a reference FASTA file with target regions, the out of memory error is quite expected when trying to load the entire genome. You can load all the exons from human exome in FASTA file, but not entire chromosome sequences, please see http://mageri.readthedocs.io/en/latest/body.html#genomic-information