Improve Dataset.xsimlab.update_vars for output variables
benbovy opened this issue · 3 comments
There are some issues with the current behavior that is not very clear, e.g.,
- For each given dict item (
None
or a clock dimension): all existing output variables are discarded - If we add an output variable for a clock dimension, which was already present in the
None
item, this will have no effect (in the output Dataset, only one snapshot is saved at the end of the simulation).
This behavior is consistent with dict.update()
. However, I would be better here to update as if the dict keys were the variables and the values the clock dim or None
. It makes more sense for this specific problem, the implementation would be clean and it's even possible to ensure that a variable is not given for more than one clock dim.
Actually, would this be better API? For example,
ds.xsimlab.update_vars(
output_vars={
'foo__bar': None,
'foo__baz': 'clock'
}
)
On one hand, it is more consistent with the input_vars
parameter. It is also more flat. On the other hand, it feels more natural to say "I want to save snapshots on this clock dimension for variables 'a', 'b', 'c'..." than "I want to save snapshots for variable 'a' on this clock dimension, for variable 'b' on this clock dimension too, and for variable 'c' on the same dimension".
Both could be supported but it is probably a bad idea (too complex), but good for a smooth transition.
There is still no easy way to just drop a subset of output variables. Since None
is already used, one possibility would be to use a sentinel value, e.g., xsimlab.DROP
.
On one hand, it is more consistent with the input_vars parameter. It is also more flat.
It also allows to later add other kinds of settings like step intervals (integer values) instead of a clock dimension.
it feels more natural to say "I want to save snapshots on this clock dimension for variables 'a', 'b', 'c'..." than "I want to save snapshots for variable 'a' on this clock dimension, for variable 'b' on this clock dimension too, and for variable 'c' on the same dimension".
On the other hand it's natural for an output_vars
parameter to expect variable names as keys.