xarray-contrib/xarray-simlab

Improve Dataset.xsimlab.update_vars for output variables

benbovy opened this issue · 3 comments

There are some issues with the current behavior that is not very clear, e.g.,

  • For each given dict item (None or a clock dimension): all existing output variables are discarded
  • If we add an output variable for a clock dimension, which was already present in the None item, this will have no effect (in the output Dataset, only one snapshot is saved at the end of the simulation).

This behavior is consistent with dict.update(). However, I would be better here to update as if the dict keys were the variables and the values the clock dim or None. It makes more sense for this specific problem, the implementation would be clean and it's even possible to ensure that a variable is not given for more than one clock dim.

Actually, would this be better API? For example,

ds.xsimlab.update_vars(
    output_vars={
        'foo__bar': None,
        'foo__baz': 'clock'
    }
)

On one hand, it is more consistent with the input_vars parameter. It is also more flat. On the other hand, it feels more natural to say "I want to save snapshots on this clock dimension for variables 'a', 'b', 'c'..." than "I want to save snapshots for variable 'a' on this clock dimension, for variable 'b' on this clock dimension too, and for variable 'c' on the same dimension".

Both could be supported but it is probably a bad idea (too complex), but good for a smooth transition.

There is still no easy way to just drop a subset of output variables. Since None is already used, one possibility would be to use a sentinel value, e.g., xsimlab.DROP.

On one hand, it is more consistent with the input_vars parameter. It is also more flat.

It also allows to later add other kinds of settings like step intervals (integer values) instead of a clock dimension.

it feels more natural to say "I want to save snapshots on this clock dimension for variables 'a', 'b', 'c'..." than "I want to save snapshots for variable 'a' on this clock dimension, for variable 'b' on this clock dimension too, and for variable 'c' on the same dimension".

On the other hand it's natural for an output_vars parameter to expect variable names as keys.