UWB-Biocomputing/BrainGrid

Determine if AllSynapsesProps.W is the canonical home of synaptic weights

stiber opened this issue · 5 comments

What kind of issue is this?

  • Bug report
  • Feature request
  • Question not answered in documentation
  • Cleanup needed

This is a precursor to being able to serialize the weight matrix at the end of a simulation, and then being able to deserialize it and build all the synapses at the start of a simulation. Right now, we know that the Connections classes update this variable and create/destroy the synapses themselves for the CPU simulations. For the GPU simulations, ConnGrowth::updateSynapsesWeightsThread() executes the updateSynapsesWeightsDevice() kernel function, then uses AllSpikingSynapsesProps::copyDeviceSynapseCountsToHost() to copy some GPU synapse information back to the synapse property object.

So, the tentative idea would be to implement a serialize method for AllSynapsesProps, that would (for CPU simulations) serialize W or (for GPU simulations, assuming the above method doesn't copy the weights back) copy the weights back from the GPU then serialize W.

Bottom line: we need to first double-check everyplace that AllSynapsesProps.W is accessed in the simulator. Hopefully, we've already found those locations.

What is affected by this?

How do we replicate the issue/how would it work?

Expected behavior (i.e. solution or outline of what it would look like)

Other Comments

More information:

  • ConnGrowth doesn't read AllSynapsesProps.W, but it writes updated weights there each epoch.
  • ConnStatic writes AllSynapsesProps.W during connection setup.

So far, we've learned that the only place that copySynapseDeviceToHostProps() (the public copy method in the props classes) is called is from Core/GPUSpikingCluster. This is done from GPUSpikingCluster::deleteDeviceStruct(). This might be hopeful, because it would mean that everything gets copied back to the props class during simulator shutdown and we might be able to serialize at that point. On the other hand, that might be too late in the program execution.

  • GPUSpikingCluster::deleteDeviceStruct() is called from GPUSpikingCluster::cleanupCluster().

  • GPUSpikingCluster::cleanupCluster() is called from Model::cleanupSim().

  • Model::cleanupSim() is called from Simulator::finish() and Simulator::reset()

  • Simulator::reset() is for debugging (I believe(?)). Simulator::finish() is called from BGdriver when the simulation is finished (i.e. post serialization).

  • in BGdriver, we

  1. Simulator::simulate()
  2. perform serialization
  3. Simulator::finish()

So, if we copySynapseDeviceToHostProps() at the Simulator::finish() stage, should we do serialization after Simulator::finish()?? If not, we need to separate CPU and GPU serialization then...

I would say the question is what else happens during all of this cleanup. The classes involved are Simulator, Model, Cluster (and subclasses), Connections (and subclasses), etc. Do any of those cleanup methods free up storage, for either the CPU or GPU simulation? In other words, is anything lost?

The ideal solution would be to move the line simulator->finish(simInfo); before simulator->saveData(simInfo);. The idea would be we simulate, turn off the input, finish up all simulation activity, then save output and serialize. After that comes delete everything and end. (And, in fact, the current call simInfo->simRecorder->term(); happens after finishing the simulation.)

I think what we need is a summary of what finish() currently does. If it copies everything off of the GPU, terminates threads, and does miscellaneous cleanup, but leaves all of the data, etc. on the host, ready to output/serialize, then that's nice. If not, then maybe we need to split off the problematic stuff to a different set of methods that cleans thing up, so finishing is non-destructive on the host side (but frees up everything on the GPU), while a cleanup method cleans up on the host side.