Why we need to set dask to single tread?
Closed this issue · 4 comments
# TODO: current workaround to dask thread problems
import dask
dask.config.set(scheduler='single-threaded')
I've noticed force single threaded mode #39
@SiMaria this makes the trick with the preprocessing obsolete, but forces single threads instead of the (theoretically) faster multi-threading. In our case, saving memory is more important than time so we should be good with this change for now
And it may has someting to do with #37
Currrently I tend to use
client = Client(n_workers=16, threads_per_worker=1, memory_limit="8GB")
so this setting may change the default config and can't work with my current workflow.
And in my case, If i comment the dask.config.set(scheduler='single-threaded')
, I didn't meet any error when loading data by
o3 = salem.open_mf_wrf_dataset(date_strings)
I'll do more tests when I have time.
Thanks for the report!
Salem has been in a maintenance-only mode since quite a while now, I wouldn't be surprised if a lot of the code is not up-to-date with dask/xarray standards...
@singledoggy is there anything we can do here? I'll release a maintenance update soon, it might be a moment to tackle this....
Sorry for the late reply. I must clarify that my expertise in dask is limited, and therefore, the suggestions provided may not be entirely accurate.
I notice that the HDF5 library was not thread safe , so it's wise to set single-threaded
mode. But I don't know why this conficts with the settings in the Dask cluster.
But maybe it's better to temporarily set configuration values within a context manager?
# As a context manager
>>> with dask.config.set(scheduler='processes'):
... x.sum().compute()
# Set globally
>>> dask.config.set(scheduler='processes')
>>> x.sum().compute()
And I'm not sure if it's a good way to set dask.config.set(scheduler='single-threaded')
within salem
, I've checked xarray method open_mfdataset, it seems that they have not chose this way.