monocongo/climate_indices

Replace multiprocessing shared memory code with deco

Opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
Yes. There is a lot of code in the main processing script for parallelization over a shared object in memory. While it does work the code is hard to understand and maintain. I'd like to replace the multiprocessing and shared memory aspects of this code's main processing script with deco and then do a comparison to see which runs faster. As long as the smaller/simpler code using deco is not too much slower than the current implementation then we'll deemed it a success.

Describe the solution you'd like
Lots of multiprocessing and shared memory related code will go away and be replaced by much fewer lines of code that leverage deco for multiprocessing over a shared grid.

Describe alternatives you've considered
I've tried this more than once using xarray and dask, but my powers were not sufficient and I rolled my own solution. Maybe deco will be easier to get right without so muck hackery.

I am currently learning how to do better with dask and xarray. Probably the best lesson I have learned so far is to not be too clever. Looking through your code -- yes, the hand-crafted solution is daunting and is probably a turn-off for potential contributors. However, it does work, and it isn't terrible. If anything, I think you should press forward and embrace xarray even further in order to first eliminate a bunch of cruft that you can let xarray handle, particularly data structure. A lot of code is spent trying to coerce the data when ultimately you just want to iterate over groups of time series.