fanagislab/EndHiC

Unidentified problem with input files causes EndHiC to consume all memory until it crashes

hazmup opened this issue · 3 comments

I am running the endhic.pl pipeline with what seem like well formated input files, but it starts eating up all available RAM until a few hours later it crashes. To me it doesn't seem to be a formatting issue, as I have inspected the files. I have tested EndHiC using the test data and it passed, I have also tested using a different dataset of our own that also runs correctly. HiC-Pro seems to have run successfully for both datasets. The main difference between the two datasets is that the one that fails has half the HiC coverage of the other (around x40 instead of x70). I am attaching the input files for further inspection.
endhic_input_files.zip

After a lot of checking I tried increasing the --minbinnum, and it stopped crashing above 7. Maybe this is related to the relativeley low coverage of the HiC reads that causes the contact turning points to have values in the 1-10 range?

I believe I traced it down. This happens when the .cluster file created by cluster_and_classify_GFA.pl is empty. I am not sure what the intended behavior should be in cases like this.

Edit: It seems the .cluster file is generated by .order_and_orient_GFA.pl, even though the documentation of .order_and_orient_GFA.pl indicates otherwise.

=head1 Exmple

order_and_orient_GFA.pl --size 6000000 formal_100000_iced.matrix.revised.100000.30.CtgContact.overCutoff.5.0.reciprocalMax.gfa formal_100000_iced.matrix.revised.100000.30.CtgContact.overCutoff.5.0.reciprocalMax.gfa.cluster > formal_100000_iced.matrix.revised.100000.30.CtgContact.overCutoff.5.0.reciprocalMax.gfa.cluster.order.orient

So, there seems to be some problem with the files .reciprocalMax.gfa.cluster or .reciprocalMax.gfa files, that cause .order_and_orient_GFA.pl to stall before producing a .cluster file.

I am running the endhic.pl pipeline with what seem like well formated input files, but it starts eating up all available RAM until a few hours later it crashes. To me it doesn't seem to be a formatting issue, as I have inspected the files. I have tested EndHiC using the test data and it passed, I have also tested using a different dataset of our own that also runs correctly. HiC-Pro seems to have run successfully for both datasets. The main difference between the two datasets is that the one that fails has half the HiC coverage of the other (around x40 instead of x70). I am attaching the input files for further inspection. endhic_input_files.zip

I have met the same issue. Have you solved it?