BerkeleyLab/caffeine

Add proper handling of shared memory exhaustion

bonachea opened this issue · 0 comments

Currently Caffeine always crashes with obscure signals on shared memory exhaustion. Shared heap exhaustion is a very common failure mode in practice and probably the most important runtime error to detect and issue a high-quality diagnostic.

At a minimum, Caffeine needs to be updated to be PRIF-compliant and return stat=PRIF_STAT_OUT_OF_MEMORY when stat is provided.

I'd also like to see us format a nice error message that includes the total size of the current shared heap, the total size of the primordial shared heaps and the size of the request that failed. Then upon failure we can either return that message in errmsg (if provided) or print it to the console along with error termination.