ncar-xdev/xpersist

lock cache file when computing

Opened this issue · 2 comments

To avoid a race condition when computing in parallel, we should enable a "lock" feature on the cache file. If a lock file is detected, wait until it's removed before reading the cache. We should wrap the cache creation in a try/except block to clean up lock files if there is a failure.

@matt-long
Per xarray's open_dataset() documentation, xarray handles the locking for the user:

lock (False or duck threading.Lock, optional) – Resource lock to use when reading data from disk. Only relevant when using dask or another form of parallelism. By default, appropriate locks are chosen to safely read and write files with the currently active dask scheduler.

Should we still come up with our own approach? If so, do you have a good test case that I can use as a reference?

When it comes to writing, my understanding is that:

  • ds.to_netcdf() is serial by default, and a single process writes to the file while others are just idle.
  • By default, the compute parameter for to_netcdf() is set to True, which guarantees that the to_netcdf() is both synchronous and serial.

Let me know if I am missing something.