fmaussion/salem

Why we need to set dask to single tread?

Closed this issue · 4 comments

    # TODO: current workaround to dask thread problems
    import dask
    dask.config.set(scheduler='single-threaded')

I've noticed force single threaded mode #39

@SiMaria this makes the trick with the preprocessing obsolete, but forces single threads instead of the (theoretically) faster multi-threading. In our case, saving memory is more important than time so we should be good with this change for now

And it may has someting to do with #37

Currrently I tend to use

client = Client(n_workers=16, threads_per_worker=1, memory_limit="8GB")

so this setting may change the default config and can't work with my current workflow.

And in my case, If i comment the dask.config.set(scheduler='single-threaded'), I didn't meet any error when loading data by

o3 = salem.open_mf_wrf_dataset(date_strings)

I'll do more tests when I have time.

Thanks for the report!

Salem has been in a maintenance-only mode since quite a while now, I wouldn't be surprised if a lot of the code is not up-to-date with dask/xarray standards...

@singledoggy is there anything we can do here? I'll release a maintenance update soon, it might be a moment to tackle this....

Sorry for the late reply. I must clarify that my expertise in dask is limited, and therefore, the suggestions provided may not be entirely accurate.

I notice that the HDF5 library was not thread safe , so it's wise to set single-threaded mode. But I don't know why this conficts with the settings in the Dask cluster.

But maybe it's better to temporarily set configuration values within a context manager?

# As a context manager
>>> with dask.config.set(scheduler='processes'):
...     x.sum().compute()
# Set globally
>>> dask.config.set(scheduler='processes')
>>> x.sum().compute()

And I'm not sure if it's a good way to set dask.config.set(scheduler='single-threaded') within salem, I've checked xarray method open_mfdataset, it seems that they have not chose this way.