MouseLand/cellpose

[BUG] interpolation overflow error

Closed this issue · 3 comments

Describe the bug
Flows interpolation can produce invalid flows.

To Reproduce
Difficult to reproduce locally or interactively, but seems to trigger as an HPC job (still troubleshooting why). During debugging, I can see that pre-interpolation the flows indeces (p) are bounded by the image coordinates, whereas after steps2D_interp some values are now nan.

Traceback

Traceback (most recent call last):

File "<...>", line 158, in _predictfun
model.eval(img, channels=channels, diameter=diameter, **kwargs)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>lib/python3.11/site-packages/cellpose/models.py", line 551, in eval
masks, styles, dP, cellprob, p = self._run_cp(x,
^^^^^^^^^^^^^^^
File "<...>lib/python3.11/site-packages/cellpose/models.py", line 651, in _run_cp
outputs = dynamics.compute_masks(dP[:,i], cellprob[i], niter=niter, cellprob_threshold=cellprob_threshold,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>lib/python3.11/site-packages/cellpose/dynamics.py", line 719, in compute_masks
mask = get_masks(p, iscell=cp_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>lib/python3.11/site-packages/cellpose/dynamics.py", line 685, in get_masks
M0 = M[tuple(pflows)]
~^^^^^^^^^^^^^^^
IndexError: index -2147483628 is out of bounds for axis 0 with size 2088

[Note:] this issue initially suggested that this may be an OOM error, which seems to have been incorrect.

Follow-up: This bug might not be related to being OOM at all and just conindentally stopped occuring when more memory was requested. Instead, there is an overflow somewhere, probably in the steps2D_interp step, which I am still looking into.

Thanks for looking into this jeskowanger!
I also found the cellpose 3 takes up a huge amount of mem no matter what you throw at it

Despite intensive troubleshooting I have not been able to identify the source of the overflow, it unfortunately occurs too sporadiacally. Interestingly, using a Docker container based on cellpose 2 the issue does not reproduce, so I am using that for my research now. For anyone wishing to do the same, the Docker image can be downloaded from Docker hub.
One last note on the overflow error itself: I fear that it stems from the frequent use of np.float32 in the source, when arrays are typically np.uint32 (to which they are in fact cast to, at later stages). It is almost certain that the coercion of from float32 to uint32 is the cause of the overflow, and one may wish to rework the internal types used as a result.