Failing tests on `distributed>2023.9.2`
pentschev opened this issue · 3 comments
Two tests are currently failing after removing the distributed==2023.9.2
pin:
FAILED tests/test_local_cuda_cluster.py::test_pre_import_not_found - Failed: DID NOT RAISE <class 'RuntimeError'>
FAILED tests/test_local_cuda_cluster.py::test_death_timeout_raises - Failed: DID NOT RAISE <class 'asyncio.exceptions.TimeoutError'>
We need to bisect to find the source of those regressions, but for now we're xfailing them to allow unpinning going through.
FAILED tests/test_local_cuda_cluster.py::test_pre_import_not_found
This one seems to be because scaling up a speccluster now swallows any errors when scaling up. (dask/distributed#8309)
FAILED tests/test_local_cuda_cluster.py::test_death_timeout_raises
This is the same cause.
Thanks for digging into that @wence- .
I'm sure this is not the first time this is problematic. This is definitely not something we need to do right now, but I'm wondering if we should do a pass on Dask-CUDA's tests that could be generalized and submit them as part of Distributed's test set to prevent regressions there.