UWB-Biocomputing/BrainGrid

See if we can write HDF5 file incrementally to speed simulator clean up at the end

stiber opened this issue · 4 comments

What kind of issue is this?

  • Bug report
  • Feature request
  • Question not answered in documentation
  • Cleanup needed
  • Question

When a large and long simulation finishes (completes last epoch), especially if all spike information is collected, it takes a long time before the program terminates (we're talking many hours to maybe days). What's going on? The HDF5 file is incrementally written and the GPU memory only holds one epoch's worth of spike data.

We note that there are four lines of diagnostic output at the very end:
Done with simulation cycle, beginning growth update 600 Updating distance between frontiers... computing areas of overlap total_synapse_counts: xxxxx
This is the growth and connectivity update that happens after each epoch. Then, there is a long period of time before the simulator exits. This does not seems to load the CPU, so it isn't computationally intensive. Is it something terribly inefficient relating to HDF5 closing the output file?

What is affected by this?

Simulator.

How do we replicate the issue/how would it work?

Expected behavior (i.e. solution or outline of what it would look like)

Other Comments

fumik commented

We save all neuron's spiking history (probed data) in CPU memory during simulation. At the very end of simulation, we write the data to HDF5 file. We incrementally write bustiness and spikes history data to HDF5 file because these data are fixed size for all neurons but not probed data are not fixed size for all neurons.

OK, I will edit the issue description to reflect that we should investigate our approach and determine if an alternative organization would allow incremental writes.

In particular, it turns out that for analysis purposes it would be better to store, for each time step, a list of neurons that spiked. This still wouldn't be fixed size, since we don't know the maximum number of neurons that would spike during a time step (other than the upper limit of all neurons).

We should look into the information in issues #216 and #186 while we think about changing how we approach HDF5 file saving.

Another point is that, since all of the data is stored in memory on the CPU, this can suck up a large fraction of virtual memory and could generate swapping behavior (even though much of the memory is quiescent and presumably not being accessed, it's hard to predict what the impact would be, especially when a large fraction of virtual memory is in use).

fumik commented

Now spikesProbedNeurons data are incrementally written at the end of every epoch.