appliedtopology/javaplex

How to restrict the total number of simplices from a distance matrix data?

peter308 opened this issue · 1 comments

Dear Admin
I have a bunch of distance matrix data, and I am suffering from one problem these days. The distance matrix of my system is quite large, the size is 550x550, and I usually got larger than 1x10^8 simplices, this results in the java heap space or GC overhead limit exceed issue. I already ran the job on a hpc with ram size as big as 128GB. I also tried witness stream and landmark selector, but the number of simplices are still quite large and leading to GC overhead limit errors. Is there an upper limit for the number of simplices that Javaplex can handle? Or can you give me some advice e.g. how to change the options in the script file, so that I can get the barcode and Betti numbers for my case, even just rough results are good enough. Sincerely appreciated.

Best Regards,
Peter

Hi Peter,
I'm sure you already saw Section 7.1 in the Javaplex tutorial (https://www.math.colostate.edu/~adams/research/javaplex_tutorial.pdf) on increasing the heap size.
Witness complexes are a good way to get approximate answers with less computational effort.
For both witness complexes and Vietoris--Rips complexes, keeping the maximum filtration parameter small (especially at first until you see how the scale of the computation grows as the maximum filtration parameter increases) is necessary in order for computations to finish on large datasets.

Beyond that, my main advice is to try more modern software, such as Ripser or GUDHI etc, for computing persistent homology. More recent software is much faster than Javaplex. A list of some such software packages is available half-way down on my following webpage:
https://www.math.colostate.edu/~adams/advising/
There are also some software packages there that compute approximate Vietoris-Rips barcodes via different methods (besides witness complexes).

Best, Henry