xarray-contrib/xarray-simlab

Multi-index issue: simulations batches and xarray recent versions

benbovy opened this issue · 9 comments

@benbovy, do you have a timeline for updating the dependencies to the latest versions of xarray and Python?

@xaviernogueira and I are considering using this repo as a foundation for a model refactoring effort. We unfortunately have a short timeline on this first round so need immediate success, but we could see ourselves contributing to this repo.

@aufdenkampe, I'm glad you are considering using xarray-simlab! I have no particular timeline but I'd like to fix this issue and release version 0.5.1 anyway. What is your timeline? I can try to find some time within the next two weeks. Contributions are welcome too!

Regarding this specific issue, there is an easy workaround with the recent versions of Xarray (fastscape-lem/fastscape#26 (comment)) so it shouldn't be a blocker for using xarray-simlab.

Apart from this issue, all the tests are passing in my local environment that I've just updated to python 3.11 and the last versions of Xarray, Dask, Zarr and Attrs.

@benbovy, we have a month to pull off a pilot of the refactor, so we may need to use your work-around. Thanks!

@benbovy, This repo really looks impressive and has great documentation, so I'm back to reconsidering whether to use xarray-simlab as a dependency on a modeling system we're implementing for a US federal agency (i.e. https://github.com/EcohydrologyTeam/ClearWater-modules). Because of our previous need to move quickly for the first set of deliverables, @xaviernogueira implemented a simple custom system that was inspired by xarray-simlab. However, we just got funded for the next round of work, which is substantially more complex, and many of the features we'll need appear to already be implemented in xarray-simlab. I'll would be tackling that work with @sjordan29 and others, as @xaviernogueira has moved on to other endeavors.

My main hesitation at this point is that xarray-simlab hasn't really been maintained or updated in >3 years, including 6 open pull requests from @feefladder that you invited with your comment at #177 (comment). I understand how these things can happen, as I have a few repos in a similar situation due to a discontinuation of funding.

So my question is whether you see a future where could be actively maintained by you and a community of contributors and users (which might include our team)?

I see that you and your team are using xarray-simlab in https://github.com/fastscape-lem/fastscape. Are there other active users?

There were other active users back when I was active here and submitted my pull requests. In the meantime, I've been doing different things and haven't kept up, so I don't know...

My PR's were rather complex with a lot of DFS's though, so I think that's why they are still open: Was this level of complexity really desired, since it reduces maintainability?

@aufdenkampe, Indeed this repo has been very quiet for a while but is still actively used at least via fastscape and a couple of other applications that I'm aware of.

Apart from this minor & non-blocking issue with mutli-indexes and @feefladder's open PRs there hasn't been (much) bug reports or feature requests so it seems to work mostly fine. Xarray-simlab actually tries not to implement too many things but instead leverage as much as possible the capabilities of xarray, dask and zarr, which are mature libraries. I have a few small-ish improvement ideas in mind (mainly focused around better & interactive repr and visualization for Model objects) but I haven't taken the time to implement them so far.

My PR's were rather complex with a lot of DFS's though, so I think that's why they are still open

Yes that's right. Those PRs tackle (at least partially) some important limitations that I still find annoying, i.e., the processes in a model cannot be executed in a user-defined order and foreign variables cannot have an "inout" intent. I'm not super happy with the hacks that I used "fastscape" to circumvent this... I think I've made some design mistakes when adding "inout" variables and the "run_step" / "finalize_step" simulation stages.

Those PRs are complex indeed and really target the core of xarray-simlab. Although I really appreciate that @feefladder submitted them, it requires a in-depth review that I haven't done yet and also possibly some additional design iterations.

So my question is whether you see a future where could be actively maintained by you and a community of contributors and users (which might include our team)?

Yes sure! I cannot promise to spend myself a lot of time on maintaining this repository but I'd be happy to see contributors and maintainers getting onboard (including your team!), as long as we can keep xarray-simlab both maintainable and working for its current users (deprecation cycles are OK).

@benbovy and @feefladder, thanks for your rapid and detailed responses! Together they increase my interest in adopting xarray-simlab as a framework for our models.

Our particular use case (i.e. simulating aquatic temperature and nutrient dynamics in a flooding river, using a 2D irregular grid of flows from hydrodynamic models) will likely need both of the limitations that are addressed by @feefladder's PRs (inout with foreign variables and user-defined process order), as our first task is to reproduce the same results produced by a set of legacy Fortran modules.

Should I then perhaps fork and build from @feefladder's fork? Or can we work toward merging those PRs?

Fortunately the first module for temperature is much simpler, and doesn't need those capabilities, so we can get started with the upstream master.

Also, our long-term goal is to couple with many existing models (using BMI), and one of those models has been implemented in Landlab, which I know was one of your inspirations. That's going to lead some to ask us why we don't just implement these water nutrient processes in Landlab. My thinking is that xarray-simlab is leaner, possibly more flexible, likely to perform better given that it integrates xarray and dask at its core. The later is particularly important, as the hydrodynamic models to which we are coupling have large grids and small time steps. To your knowledge, is my reasoning defensible?

Should I then perhaps fork and build from @feefladder's fork? Or can we work toward merging those PRs?

It would be nice to eventually have this functionality available here.

Also, our long-term goal is to couple with many existing models (using BMI)

I had in mind of eventually implementing a BMI class that can wrap an xsimlab.Model instance such that a BMI compliant model can be dynamically created from any xarray-simlab model with minimal manual setup. But perhaps you rather want the other way around, i.e., run BMI models within xarray-simlab processes? I guess it could also be possible to write a wrapper of a process class for that... The nice thing is that both BMI and xarray-simlab allow easy access to model metadata programmatically.

My thinking is that xarray-simlab is leaner, possibly more flexible, likely to perform better given that it integrates xarray and dask at its core. The later is particularly important, as the hydrodynamic models to which we are coupling have large grids and small time steps.

So far I've only used numpy arrays inside xarray-simlab processes and I've used dask through xarray-simlab mainly for running multiple simulations in parallel. However, given how xarray, dask and zarr are tightly integrated between each other there is a good chance that using dask arrays within model processes and/or as model input variables will "just" work (or after small adjustments).

That said, xarray-simlab provides a fairly high-level of abstraction that sometimes makes it also difficult to track down performance or other issues, which requires to be careful when setting-up and running simulations. For example, it happened that we were struggling with the very slow execution of some batches of simulations before figuring out that xarray-simlab was just writing too many (and large) model output snapshots at the same time via dask and zarr (IO bottleneck).

Also, using dask for both computing process variables and running models in parallel will likely perform badly.

Using xarray DataArrays inside processes is discussed in #141.

(This discussion is drifting away from the original topic of this issue... shall we move it to a new discussion topic?)

@benbovy, thanks for all these very helpful responses! I agree that this discussion is probably distracting from the named issue. I should have created a new issue from the start! I like the idea of moving it to a new discussion topic, but I think you need to enable it for this repo. By the way, thanks for pointing me to #141. I'm very interested in keeping/leveraging the xarray metadata as much as possible, and trying to move toward a zero-copy approach as much as possible in all my processes. Our current implementation has lots of memory over-runs because we didn't keep track of that very well.