brian-team/brian2cuda

StateMonitor idexing with slice syntax doesn't work

denisalevi opened this issue · 6 comments

For some reason using [1:] style indexing for a StateMonitor returns an empty array while e.g. [1] indexing returns the correct value.

...
mon = StateMonitor(...)
...
run(2*defaultclock.dt)  # -> len(mon.t) is 2
print(mon.t[1:])  # returns []
print(mon[1])  # returns Quantity([defaultclock.dt])

This seems to apply for any kind of [3:] style indexing of the StateMonitor.
This breaks e.g. brian2.tests.test_monitor.test_state_monitor_indexing.

The StateMonitor is also not working working for many neurons. Currently the StateMonitor is called with 1 block and as many threads as there are Neurons (or Synapses etc) to record. So if there are more Neurons then the maximum number of threads, the kernel fails to launch.

And the global memory writes are not coalesced. Currently we have a 2D data structure of dimensions indices x record_times (vector of vectors) for each variable monitor. And we fill that in the kernel like this

monitor[tid][current_iteration] = ...

For coalesced writes we could just "transpose" the monitor data structure so we can use

monitor[current_iteration][tid] = ...

We might have to resort the monitor in the end though since it might then not fit with the format that Brian expects to read back.

There also seems to be a problem with the ordering of returned values. Might be related to #46 . In this test, the first assert sums over the not ordered dimension and passes. The second one failes (because of wrong ordering). The monitors return arrays are 2D arrays with shape (num_indices, num_times).

import os                                                                                             
                                                                                                      
from numpy.testing.utils import assert_allclose                                                       
                                                                                                      
from brian2 import *                                                                                  
from brian2.devices.device import reinit_devices, set_device                                          
                                                                                                      
import brian2cuda                                                                                     
                                                                                                      
directory = os.path.splitext(os.path.basename(__file__))[0]                                           
                                                                                                      
results = {}                                                                                          
                                                                                                      
n_cells = 5                                                                                           
n_recorded = 10                                                                                                                                                                                                                 
delay = np.arange(n_cells) * defaultclock.dt                                                          
                                                                                                      
for devicename in ['cpp_standalone', 'cuda_standalone']:                                              
    set_device(devicename, directory=directory + '_' + devicename)                                    
    Synapses.__instances__().clear()                                                                  
    reinit_devices()                                                                                  
    P = NeuronGroup(n_cells, model='', threshold='True')                                              
    S = Synapses(P, P,                                                                                
                 model='''w : 1''',                                                                   
                 on_pre='''w += 1''')                                                                 
    S.connect(j='i')                                                                                  
    S.pre.delay = delay                                                                               
                                                                                                      
    state_mon = StateMonitor(S, 'w', record=range(n_recorded))                                        
                                                                                                      
    run(defaultclock.dt * (n_cells + 1), report='text')                                               
                                                                                                      
    results[devicename] = state_mon.w.astype(int)                                                     
                                                                                                      
assert_allclose(results['cpp_standalone'].sum(axis=0), results['cuda_standalone'].sum(axis=0))        
assert_allclose(results['cpp_standalone'], results['cuda_standalone'])                                

EDIT 13.04.21: This test seems to be passing without issues now (I also added it to our test suite). test_statemonitor_indexing is still failing though.

Could this be a similar issue as here: brian-team/brian2#1119?

This is being fixed in PR #202, follow-up issue for fixing the single block usage and optimizing statemonitor template is #201.

There were two problems that needed to be fixed here:

  1. We need the _array_<monitor>_N variables in the standalone code to hold the correct monitor size, since this value is stored in the results folder and read when indexing with slice syntax.
  2. The dev_array_<monitor>_N device variable is copied inte _array_<monitor>_N in write_arrays(). Therefore, we need to set the device variable instead of the host variable for the written array to be correct! (or finally clean up the copying and writing mechanism...).
  3. We need to mark N in WRITES_READ_ONLY_VARIABLES in the templates, otherwise brian2 will used a cached value instead of reading it from the results folder (see cpp_standalone/device.py:Device.get_value()).

The brian-team/brian2#1119 had a very similar problem, which I also fixed in PR #202.

PR #202 is merged now into update-brian2-submodule branch and will be merged into master with #217. Closing.