StateMonitor idexing with slice syntax doesn't work
denisalevi opened this issue · 6 comments
For some reason using [1:]
style indexing for a StateMonitor
returns an empty array while e.g. [1]
indexing returns the correct value.
...
mon = StateMonitor(...)
...
run(2*defaultclock.dt) # -> len(mon.t) is 2
print(mon.t[1:]) # returns []
print(mon[1]) # returns Quantity([defaultclock.dt])
This seems to apply for any kind of [3:]
style indexing of the StateMonitor.
This breaks e.g. brian2.tests.test_monitor.test_state_monitor_indexing.
The StateMonitor is also not working working for many neurons. Currently the StateMonitor
is called with 1 block and as many threads as there are Neurons (or Synapses etc) to record. So if there are more Neurons then the maximum number of threads, the kernel fails to launch.
And the global memory writes are not coalesced. Currently we have a 2D data structure of dimensions indices x record_times
(vector of vectors) for each variable monitor. And we fill that in the kernel like this
monitor[tid][current_iteration] = ...
For coalesced writes we could just "transpose" the monitor data structure so we can use
monitor[current_iteration][tid] = ...
We might have to resort the monitor in the end though since it might then not fit with the format that Brian expects to read back.
There also seems to be a problem with the ordering of returned values. Might be related to #46 . In this test, the first assert sums over the not ordered dimension and passes. The second one failes (because of wrong ordering). The monitors return arrays are 2D arrays with shape (num_indices, num_times)
.
import os
from numpy.testing.utils import assert_allclose
from brian2 import *
from brian2.devices.device import reinit_devices, set_device
import brian2cuda
directory = os.path.splitext(os.path.basename(__file__))[0]
results = {}
n_cells = 5
n_recorded = 10
delay = np.arange(n_cells) * defaultclock.dt
for devicename in ['cpp_standalone', 'cuda_standalone']:
set_device(devicename, directory=directory + '_' + devicename)
Synapses.__instances__().clear()
reinit_devices()
P = NeuronGroup(n_cells, model='', threshold='True')
S = Synapses(P, P,
model='''w : 1''',
on_pre='''w += 1''')
S.connect(j='i')
S.pre.delay = delay
state_mon = StateMonitor(S, 'w', record=range(n_recorded))
run(defaultclock.dt * (n_cells + 1), report='text')
results[devicename] = state_mon.w.astype(int)
assert_allclose(results['cpp_standalone'].sum(axis=0), results['cuda_standalone'].sum(axis=0))
assert_allclose(results['cpp_standalone'], results['cuda_standalone'])
EDIT 13.04.21: This test seems to be passing without issues now (I also added it to our test suite). test_statemonitor_indexing
is still failing though.
Could this be a similar issue as here: brian-team/brian2#1119?
This is being fixed in PR #202, follow-up issue for fixing the single block usage and optimizing statemonitor template is #201.
There were two problems that needed to be fixed here:
- We need the
_array_<monitor>_N
variables in the standalone code to hold the correct monitor size, since this value is stored in theresults
folder and read when indexing with slice syntax. - The
dev_array_<monitor>_N
device variable is copied inte_array_<monitor>_N
inwrite_arrays()
. Therefore, we need to set the device variable instead of the host variable for the written array to be correct! (or finally clean up the copying and writing mechanism...). - We need to mark
N
inWRITES_READ_ONLY_VARIABLES
in the templates, otherwise brian2 will used a cached value instead of reading it from theresults
folder (seecpp_standalone/device.py:Device.get_value()
).
The brian-team/brian2#1119 had a very similar problem, which I also fixed in PR #202.