Mock generation performance using CLF models

Question

Mock generation performance using CLF models

alanxuhaojie opened this issue 4 years ago · 4 comments

@aphearin Hi Andrew, I'm exploring the CLF parameters posterior distributions by generating MCMC-type mocks using CLF models built in halotools. I find that It takes ~45s to populate galaxies with luminosities above 10^10 for a 1Gpc/h MDPL box (halo mass cut 1e11, with around 32millon distinct halos left). This performance might be too slow for MCMC exploring. Is there any trick I miss to improve the performance? Also, I notice that model.mock.populate() uses up to ~20 cpus for populating halos, though with another 50 cpus available on the cluster here. I'm wondering if it is possible to use all cpus to improve the performance?

Thanks for answering,

Haojie

Answer 1 · 2020-09-10T06:34:40.000Z

@johannesulf Hi Johannes, do you have any idea about the issue I just raised?

Answer 2 · 2020-09-10T14:44:38.000Z

@alanxuhaojie - I think the overall performance you're seeing is to be expected for a Gpc-scale simulation. It is not easy to run MCMCs with a Gpc-scale simulation. Halotools parallelization is not arbitrarily scalable by itself, but there are many additional techniques you can use to run MCMCs with Gpc-scale simulations. @johannesulf has used TabCorr as one precomputation strategy to deal with this. Another approach is to decompose the domain of the simulation into subvolumes that you process with multiple compute nodes in parallel, but this is not something that Halotools does for you in an automated way.

Answer 3 · 2020-09-11T02:30:36.000Z

@aphearin Thanks for your information about Halotools parallelization. I thought there was some tricks I can do to improve performance. After all, it only occupies some fraction of available cpus when populating.

Answer 4 · 2020-09-11T13:10:32.000Z

Yes, Halotools uses python multiprocessing for threading, which does not scale well to large numbers of threads. In case you decide to decompose your domain and do the computation in parallel (this is how I deal with this issue), then you may find the code in thechopper to be useful.