xarray-contrib/xarray-simlab

0 values set to nan in output

feefladder opened this issue · 7 comments

Related to #170 : when accessing the step at runtime, the first value is nan.
example:

@xs.process
class Foo:
    a = xs.variable(intent="out")

    @xs.runtime(args="step")
    def run_step(self, n):
        print(n)
        self.a = n


model = xs.Model({"foo": Foo})
ds_in = xs.create_setup(
    model=model,
    clocks={"clock": range(5)},
    output_vars={"foo__a": "clock"},
)
print(ds_in.xsimlab.run(model=model).foo__a.data)

prints:

0
1
2
3
[nan 1. 2. 3.]

similar to when accessing main_clock_values:

@xs.process
class Foo:
    a = xs.variable(intent="out", dims=xs.MAIN_CLOCK)

    @xs.runtime(args="main_clock_values")
    def initialize(self, clock):
        self.a = clock

returns [nan 1. 2. 3. 4.] even though internally the correct values are used

possibly this is caused by the zarr dataset being created only in the second iteration (so when initialize has run and the first run_step etc.)

actually, 0 values are set to nan:

@xs.process
class Foo:
    a = xs.variable(intent="out")

    def run_step(self):
        self.a = 0

returns nan

jvail commented

I think this is related to https://xarray-simlab.readthedocs.io/en/latest/whats_new.html#breaking-changes

When I switched to version 0.5 I was puzzled by the new behavior. Currently I am always passing decoding: { mask_and_scale: False } into the run function.

Wow, that cost me a lot of headaches :') thanks a lot!
But still... It is quite counter-intuitive behaviour IMHO.
Now it is clear why it did work when setting it to 0.

ah yes, as it seems, it is also in the docs.
It seems to me that the values were arbitrarily chosen @benbovy ? However, 0 is a very likely value to be in the (output) data, even for basic users, that are then confronted with something that is under advanced encoding options.

wouldn't it make more sense to set the default fill value to something that is unlikely to be in the actual output?
like adding

    elif dtype.kind == "i":
        return np.iinfo(dtype).min

here?

Yep sorry for this change in version 0.5. I actually was hesitating about this change and I'm still not very happy with it. One related issue is that if we set mask_and_scale=False when opening the output zarr store as a xarray Dataset, then we have annoying attributes that can't be serialized when saving the dataset to a netcdf file.

Those issues should be fixed upstream IMO in zarr (where 0 is the default filling value, I'm not sure exactly why) and xarray.

wouldn't it make more sense to set the default fill value to something that is unlikely to be in the actual output?

Yes why not.

I made a PR #173 about it, the current settings:

  • (u)integers are set to the max value
  • strings are set to the empty string
  • booleans set to false, but issue a warning describing the masking
    Also, users can manually set the encoding, which is then respected. I can write some docs, but was a bit puzzled as to where they should go