Optimize `PopulationRateMonitor`
denisalevi opened this issue · 0 comments
Recording from a NeuronGroup
Our current implementation uses a single thread to calculate the population rate. This is in principle fine when recording from a NeuronGroup
, which is not a Subgroup
, because the monitor only has to read the number of spiking neurons from the spikespace and divide it by the number of neurons. But we could even get rid of this kernel call entirely since in many cases we have the number of spiking neurons already copies to the host.
- Fix this once #282 is implemented.
Recording from a Subgroup
But when recording from a Subgroup
, we need to go through the entire spikespace and count all neurons that are in the Subgroup
. Since the spikespace is not sorted, we can't just find the start and end indices of the subgroup in the spikespace to calculate the number of spikes for the Subgroup
(as is done in cpp_standalone
). Instead, we have to go through all spiking neurons and check if they are in the Subgroup
.
Currently, a single thread is doing this, which is terribly inefficient.
- The obvious fix is to do this in parallel with
atomicAdd
of1/N
on the ratemonitor variable (just as we do in the thresholder). Or only sum1
and divide byN
in the end (better precision?).
Alternatively, partitioning the eventspace would solve this all of course, see #284
- Or as an intermediate solution, we could just implement the counting for each ratemonitor in the thresholder itself. Then we don't have to go through the eventspace again.