Non-scalar zeros are converted to NaN

Question

Non-scalar zeros are converted to NaN

ethho opened this issue 4 years ago · 4 comments

Bug Summary

Ever since PR #148 (b3e7e93), zeros are converted to NaN values in non-scalar variables. Scalar variables with value of zero function as expected.

Methods to Reproduce

Set Up Environment

Tested in both Python 3.8.3 and 3.7.7. Set up a fresh virtualenv:

$ virtualenv ./virpy && source virpy/bin/activate

$ python3 --version
Python 3.8.3

$ pip install -e . # from root of xarray-simlab repo
$ pip install pytest "dask[distributed]" --upgrade

$ pip freeze
asciitree==0.3.3
attrs==20.3.0
click==7.1.2
cloudpickle==1.6.0
dask==2.30.0
distributed==2.30.1
fasteners==0.15
HeapDict==1.0.1
iniconfig==1.1.1
monotonic==1.5
msgpack==1.0.0
numcodecs==0.7.2
numpy==1.19.4
packaging==20.7
pandas==1.1.4
pluggy==0.13.1
psutil==5.7.3
py==1.9.0
pyparsing==2.4.7
pytest==6.1.2
python-dateutil==2.8.1
pytz==2020.4
PyYAML==5.3.1
six==1.15.0
sortedcontainers==2.3.0
tblib==1.7.0
toml==0.10.2
toolz==0.11.1
tornado==6.1
xarray==0.16.2
-e git+https://github.com/benbovy/xarray-simlab.git@5ac494a65b2e1cc08baf506cd70b69843f8eb1a7#egg=xarray_simlab
zarr==2.6.1
zict==2.0.0

Run pytests

Recursively copy the directory 20201203_debug_xsimlab to the root of this repo. One cannot git track this directory due to the git workflow in test.sh. From the root of the xarray-simlab repo:

$ bash 20201203_debug_xsimlab/test.sh
...
pytests failed on commit b3e7e93

To reproduce the pytest error on commit b3e7e93:

$ git checkout b3e7e93
$ pytest 20201203_debug_xsimlab/test.py

Results

Exactly one pytest, test_int_lst_result, fails starting at commit b3e7e93, which appears to be the merge of PR #148. All the other pytests in 20201203_debug_xsimlab/test.py pass between 0.4.1 and master HEAD (84cabf6).

Answer 1 · 2020-12-04T00:16:03.000Z

I do have the resources that I used for testing, 20201203_debug_xsimlab on disk and can push them if requested. The contents of 20201203_debug_xsimlab/test.py should be sufficient to reproduce, however:

# 20201203_debug_xsimlab/test.py
import pytest
import xsimlab as xs
import xarray as xr
import numpy as np


@xs.process
class ConsumeVar:
    foo_nonzero = xs.variable(dims=('value'), intent='in')
    foo_float = xs.variable(dims=(), intent='in')
    foo_int_lst = xs.variable(dims=('value'), intent='in')


@xs.process
class SetVar:
    foo_nonzero = xs.foreign(ConsumeVar, 'foo_nonzero', intent='out')
    foo_float = xs.foreign(ConsumeVar, 'foo_float', intent='out')
    foo_int_lst = xs.foreign(ConsumeVar, 'foo_int_lst', intent='out')

    def initialize(self):
        self.foo_int_lst = [0, 2, 3, 4, 5]
        self.foo_nonzero = [1, 2, 3, 4, 5]
        self.foo_float = float(0.)


@pytest.fixture
def model_result():
    """run the model"""
    model = xs.Model({
        'set_var': SetVar,
        'consume_var': ConsumeVar,
    })
    in_ds = xs.create_setup(
        model=model,
        clocks=dict(step=range(10)),
        output_vars=dict(
            set_var__foo_nonzero='step',
            set_var__foo_float='step',
            set_var__foo_int_lst='step',
        )
    )
    return in_ds.xsimlab.run(model=model)


def test_model_returns_dataset(model_result):
    assert isinstance(model_result, xr.Dataset)


def test_int_lst_result(model_result):
    """Only this pytest fails starting at b3e7e93"""
    # int list
    r = model_result['set_var__foo_int_lst']
    assert not r.isnull().any()


def test_float_result(model_result):
    # numpy arange
    r = model_result['set_var__foo_float']
    assert [val == 0 for val in r.values]


def test_int_lst_nonzero_result(model_result):
    # variable is array without zeroes
    r = model_result['set_var__foo_nonzero']
    assert not r.isnull().any()

Answer 2 · 2020-12-04T15:41:47.000Z

Just noticed the release note updates. Should we expect a fix by the v0.5 release?

Answer 3 · 2020-12-07T15:43:17.000Z

Passing decoding=dict(mask_and_scale=False) to Dataset.xsimlab.run passes all pytests on/before 84cabf6.

Answer 4 · 2020-12-14T09:26:29.000Z

Sorry @eho-tacc I missed your issue here. Indeed you can use decoding=dict(mask_and_scale=False) to "opt-out" this new behavior. Honestly I'm still wondering what would be the default option here.

mask_and_scale=True by default is consistent with xarray.open_zarr(), but zeros implicitly converted to NaN annoys me too, as I often deal with int variables with zeros as valid data values. I wouldn't mind setting xs.variable(encoding={'fill_value': -1}) for such variables, but still if there's no fill value in the data, in the output dataset the dtype gets converted from int to float, which is annoying.
mask_and_scale=False leaves a _FillValue attribute in the output Dataset's data variables, which may cause troubles when saving the dataset to a netcdf file. That's annoying too.

Either way, I think this should be improved directly in Xarray and/or Zarr.