brian-team/brian2cuda

Optimize `PopulationRateMonitor`

denisalevi opened this issue · 0 comments

Recording from a NeuronGroup

Our current implementation uses a single thread to calculate the population rate. This is in principle fine when recording from a NeuronGroup, which is not a Subgroup, because the monitor only has to read the number of spiking neurons from the spikespace and divide it by the number of neurons. But we could even get rid of this kernel call entirely since in many cases we have the number of spiking neurons already copies to the host.

  • Fix this once #282 is implemented.

Recording from a Subgroup

But when recording from a Subgroup, we need to go through the entire spikespace and count all neurons that are in the Subgroup. Since the spikespace is not sorted, we can't just find the start and end indices of the subgroup in the spikespace to calculate the number of spikes for the Subgroup (as is done in cpp_standalone). Instead, we have to go through all spiking neurons and check if they are in the Subgroup.

Currently, a single thread is doing this, which is terribly inefficient.

  • The obvious fix is to do this in parallel with atomicAdd of 1/N on the ratemonitor variable (just as we do in the thresholder). Or only sum 1 and divide by N in the end (better precision?).

Alternatively, partitioning the eventspace would solve this all of course, see #284

  • Or as an intermediate solution, we could just implement the counting for each ratemonitor in the thresholder itself. Then we don't have to go through the eventspace again.