learning-at-home/hivemind

AttributeError in MPFuture

borzunov opened this issue · 2 comments

This happens when a Petals server is ran on AMD GPUs:

  File "/path/hivemind/utils/mpfuture.py", line 300, in __del__
    MPFuture._active_futures.pop(self._uid, None)
AttributeError: 'NoneType' object has no attribute 'pop'

To the best of my knowledge, the only way this can occur is if two threads race for creating MPFuture. The first thread checked that it needs to initialize and began doing so; while it was at it, the second thread created MPFuture and, say, failed - and tried to delete it - but could not, because the first hasn't finished initializing yet.

Still, i think that the attempt to run with AMD GPUs failed elsewhere, and this is only a symptom of the real problem