Crash in mock generation due to rounding errors
Closed this issue · 2 comments
The following code creates an error. Note that all numbers matter. Different choices for model parameters likely won't produce the error.
from halotools.empirical_models import PrebuiltHodModelFactory
from halotools.sim_manager import CachedHaloCatalog
model = PrebuiltHodModelFactory(
'cacciato09', threshold=8.5, prim_haloprop_key='halo_mvir',
prim_galprop_key='luminosity')
model.param_dict['log_M_1'] = 11.2
model.param_dict['log_L_0'] = 9.95
model.param_dict['gamma_1'] = 3.5
model.param_dict['gamma_2'] = 0.25
model.param_dict['a_1'] = 0.8
model.param_dict['a_2'] = 0.0
halocat = CachedHaloCatalog(simname='bolplanck', redshift=0,
halo_finder='rockstar')
model.populate_mock(halocat)
Traceback (most recent call last):
File ~/.venv/lib64/python3.12/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)
File ~/Projects/halotools/scratch.py:16
model.populate_mock(halocat)
File ~/.venv/lib64/python3.12/site-packages/halotools/empirical_models/factories/hod_model_factory.py:1209 in populate_mock
ModelFactory.populate_mock(self, halocat, **kwargs)
File ~/.venv/lib64/python3.12/site-packages/halotools/empirical_models/factories/model_factory_template.py:254 in populate_mock
self.mock.populate(**mockpop_kwargs)
File ~/.venv/lib64/python3.12/site-packages/halotools/empirical_models/factories/hod_mock_factory.py:326 in populate
self.allocate_memory(seed=seed)
File ~/.venv/lib64/python3.12/site-packages/halotools/empirical_models/factories/hod_mock_factory.py:495 in allocate_memory
self.galaxy_table[halocatkey] = np.zeros(
ValueError: negative dimensions are not allowed
The underlying reason seems to be that the Cacciatio09 satellite occupation model produces an extremely small negative mean expected occupation for some halos. This is likely due to some rounding error since the value is something like -3e-47. This negative expectation value than correctly creates an NaN when calling pdtrik
here. However, in the same part of the code, this NaN is then explicitly converted into an in int which is -9223372036854775808. Ultimately, the expected galaxy table length is then negative, creating the crash.
Here's a potential solution: Right before calling pdtrik
to create draws from a Poisson distribution, we should check whether expectation values are negative. If so but they're effectively 0 to within some precision, we could force negative expectation values to be 0 exactly. If however, some values are negative beyond some reasonable value for numerical noise, the code should throw an error to alert the user that the model produced an unphysical occupation. @aphearin I can write this if you think that's a good solution.
Nice work on putting together a minimal reproducer and identifying what I agree is a very sensible fix (including your idea to raise an error for unexpectedly negative numbers). Maybe something like OCCLIP = -1e-3
would catch all reasonable rounding errors, while also catching actual bugs. I have to do this sort of thing all the time lately since most JAX code is single precision, and this seems reasonable here as well.
Thanks for offering to lead the fix. I'm happy to review the PR so if you'd like a second pair of eyes on the code then just tag me when you push it up.
Credit goes to Kaustav Mitra for finding the bug and producing the minimal reproducer.