Split up simulations if necessary

Question

Split up simulations if necessary

Closed this issue 3 years ago · 2 comments

The first stages of the process detailed in #44 might potentially have to simulate a large number of samples, too big to fit into memory as part of a single simulation. As a first step, we might simply let the user deal with this (e.g. by calling something like inferencer.simulate(n_samples=10_000) repeatedly). A fancier solution would be to ask the user for the maximum memory that the simulation should use, estimate the memory usage of the monitors and split up the simulation accordingly. Even more automatic (but maybe too automatic/too complicated to get right on all platforms?) would be to look at the actual memory available.

Answer 1 · 2021-07-16T14:47:35.000Z

I think that could be considered accidentally solved 😄
...at least w.r.t. this less fancy approach mentioned.
By allowing the user to separate infer process from data generation, the user can generate data in chunks:

inferencer = Inferencer(...)
n_samples = ...
prior = ...
theta = generate_training_data(n_samples, prior)
chunk_size = ...
x_1 = extract_summary_statistics(theta[:chunk_size, :])
x_2 = extract_summary_statistics(theta[chunk_size:chunk_size*2, :])
#  and so on, or just use for loop in some cool manner
x = vstack((x_1, x_2, ...))

Once the data are generated the user can continue with inference normally.

Answer 2 · 2021-07-21T20:19:20.000Z

Yes, I agree! I think this is good enough for now – if this appears to be a hurdle later on, we can always think of a more automatic mechanism. We should probably mention the approach in the documentation, though.