MouseLand/pykilosort

Ongoing issues with pykilosort

rossant opened this issue · 0 comments

(previously at MouseLand/Kilosort#201)

This GitHub issue contains some information about ongoing issues with the Python port of Kilosort 2 that should be given high development priority:

  1. On large datasets, the end of the split2 step fails, just after the loop, because of an out of GPU memory issue (reproduced on a 8 GB GPU on one of Nick Steinmetz's drift datasets). Perhaps one should make sure that intermediate CuPy arrays that are no longer used are flushed at the end of each step. Also it might be a good idea to add a checkpoint just after the long split iteration, and before the line that fails, to avoid rerunning the split iteration at each try.
12:17:22.403 [I] postprocess:649      Finished splitting. Found 658 splits, checked 2077/2077 clusters, nccg 113
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
~/spikesorting/drift/run.py in <module>
     22 probe.kcoords = matdata.kcoords.squeeze()
     23 start = datetime.datetime.now()
---> 24 run(dat_path, probe=probe, dir_path=dir_path, n_channels=385, dtype=np.int16, sample_rate=3e4)
     25 finish = datetime.datetime.now()
     26 print(start, finish)

~/git/pykilosort/pykilosort/main.py in run(dat_path, probe, params, dir_path, stop_after, **kwargs)
    222         # final splits by amplitudes
    223         with ctx.time('split_2'):
--> 224             out = splitAllClusters(ctx, False)
    225         out['st3_s0'] = out.pop('st3_s')
    226         ctx.save(**out)

~/git/pykilosort/pykilosort/postprocess.py in splitAllClusters(ctx, flag)
    665 
    666     # we re-compute similarity scores between templates
--> 667     WtW, iList = getMeWtW(W.astype(cp.float32), U.astype(cp.float32), Nnearest)
    668     # ir.iList = iList  # over-write the list of nearest templates
    669 

~/git/pykilosort/pykilosort/learn.py in getMeWtW(W, U0, Nnearest)
    520             wtw0 = mexWtW2(Params, W[:, :, i], W[:, :, j], utu0)
    521             # add it to the full correlation array
--> 522             WtW = WtW + wtw0
    523 
    524     # the maximum across timelags accounts for sample alignment mismatch

cupy/core/core.pyx in cupy.core.core.ndarray.__add__()

cupy/core/_kernel.pyx in cupy.core._kernel.ufunc.__call__()

cupy/core/_kernel.pyx in cupy.core._kernel._get_out_args()

cupy/core/core.pyx in cupy.core.core.ndarray.__init__()

cupy/cuda/memory.pyx in cupy.cuda.memory.alloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool._malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory._try_malloc()

OutOfMemoryError: out of memory to allocate 2087942144 bytes (total 7810840064 bytes)
  1. pykilosort caches intermediate results after each step (preprocess, cluster, learn, merge, split...) and provides a system to automatically resume a run after the last successful step, in case a later step fails. However in some instances a CUDA segmentation fault occurs when resuming. In the example below, the split2 step failed due to a lack of GPU memory error, and resuming this step at the next run fails, perhaps due to a shape mismatch of one of the intermediate arrays. One should check, on a minimal failing example, that the inputs to the split2 step are identical between the first and the second (resumed) run.
12:35:26.643 [I] postprocess:649      Finished splitting. Found 658 splits, checked 2077/2077 clusters, nccg 113
---------------------------------------------------------------------------
CUDADriverError                           Traceback (most recent call last)
~/spikesorting/drift/run.py in <module>
     22 probe.kcoords = matdata.kcoords.squeeze()
     23 start = datetime.datetime.now()
---> 24 run(dat_path, probe=probe, dir_path=dir_path, n_channels=385, dtype=np.int16, sample_rate=3e4)
     25 finish = datetime.datetime.now()
     26 print(start, finish)

~/git/pykilosort/pykilosort/main.py in run(dat_path, probe, params, dir_path, stop_after, **kwargs)
    222         # final splits by amplitudes
    223         with ctx.time('split_2'):
--> 224             out = splitAllClusters(ctx, False)
    225         out['st3_s0'] = out.pop('st3_s')
    226         ctx.save(**out)

~/git/pykilosort/pykilosort/postprocess.py in splitAllClusters(ctx, flag)
    662     Ka, Kb = getKernels(params)
    663     # we run SVD
--> 664     W, U, mu = mexSVDsmall2(Params, dWU, W, iC, iW, Ka, Kb)
    665 
    666     # we re-compute similarity scores between templates

~/git/pykilosort/pykilosort/learn.py in mexSVDsmall2(Params, dWU, W, iC, iW, Ka, Kb)
    312     # compute dWU * dWU'
    313     getwtw = cp.RawKernel(code, 'getwtw')
--> 314     getwtw((Nfilt,), tpS, (d_Params, d_dWUb, d_wtw))
    315 
    316     # get W by power svd iterations

cupy/core/raw.pyx in cupy.core.raw.RawKernel.__call__()

cupy/util.pyx in cupy.util.memoize.decorator.ret()

cupy/core/raw.pyx in cupy.core.raw._get_raw_kernel()

cupy/core/carray.pxi in cupy.core.core.compile_with_cache()

cupy/core/carray.pxi in cupy.core.core.compile_with_cache()

~/anaconda3/lib/python3.7/site-packages/cupy/cuda/compiler.py in compile_with_cache(source, options, arch, cache_dir, extra_source)
    160             cubin_hash = six.b(hashlib.md5(cubin).hexdigest())
    161             if hash == cubin_hash:
--> 162                 mod.load(cubin)
    163                 return mod
    164 

cupy/cuda/function.pyx in cupy.cuda.function.Module.load()

cupy/cuda/function.pyx in cupy.cuda.function.Module.load()

cupy/cuda/driver.pyx in cupy.cuda.driver.moduleLoadData()

cupy/cuda/driver.pyx in cupy.cuda.driver.check_status()

CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
  1. Care was taken during the port to ensure a line by line match between MATLAB and Python, but there might remain some discrepancies. In particular, it seems the number of good units found in Python on a given dataset is significantly lower than MATLAB. More in-depth comparisons of MATLAB/Python of the postprocessing steps should be done to make sure that each processing step gives the same outputs on the same inputs.