Non-scalar zeros are converted to NaN
ethho opened this issue · 4 comments
Bug Summary
Ever since PR #148 (b3e7e93
), zeros are converted to NaN values in non-scalar variable
s. Scalar variable
s with value of zero function as expected.
Methods to Reproduce
Set Up Environment
Tested in both Python 3.8.3 and 3.7.7. Set up a fresh virtualenv:
$ virtualenv ./virpy && source virpy/bin/activate
$ python3 --version
Python 3.8.3
$ pip install -e . # from root of xarray-simlab repo
$ pip install pytest "dask[distributed]" --upgrade
$ pip freeze
asciitree==0.3.3
attrs==20.3.0
click==7.1.2
cloudpickle==1.6.0
dask==2.30.0
distributed==2.30.1
fasteners==0.15
HeapDict==1.0.1
iniconfig==1.1.1
monotonic==1.5
msgpack==1.0.0
numcodecs==0.7.2
numpy==1.19.4
packaging==20.7
pandas==1.1.4
pluggy==0.13.1
psutil==5.7.3
py==1.9.0
pyparsing==2.4.7
pytest==6.1.2
python-dateutil==2.8.1
pytz==2020.4
PyYAML==5.3.1
six==1.15.0
sortedcontainers==2.3.0
tblib==1.7.0
toml==0.10.2
toolz==0.11.1
tornado==6.1
xarray==0.16.2
-e git+https://github.com/benbovy/xarray-simlab.git@5ac494a65b2e1cc08baf506cd70b69843f8eb1a7#egg=xarray_simlab
zarr==2.6.1
zict==2.0.0
Run pytests
Recursively copy the directory 20201203_debug_xsimlab
to the root of this repo. One cannot git track this directory due to the git workflow in test.sh
. From the root of the xarray-simlab repo:
$ bash 20201203_debug_xsimlab/test.sh
...
pytests failed on commit b3e7e93
To reproduce the pytest error on commit b3e7e93
:
$ git checkout b3e7e93
$ pytest 20201203_debug_xsimlab/test.py
Results
Exactly one pytest, test_int_lst_result
, fails starting at commit b3e7e93, which appears to be the merge of PR #148. All the other pytests in 20201203_debug_xsimlab/test.py
pass between 0.4.1
and master
HEAD (84cabf6).
I do have the resources that I used for testing, 20201203_debug_xsimlab
on disk and can push them if requested. The contents of 20201203_debug_xsimlab/test.py
should be sufficient to reproduce, however:
# 20201203_debug_xsimlab/test.py
import pytest
import xsimlab as xs
import xarray as xr
import numpy as np
@xs.process
class ConsumeVar:
foo_nonzero = xs.variable(dims=('value'), intent='in')
foo_float = xs.variable(dims=(), intent='in')
foo_int_lst = xs.variable(dims=('value'), intent='in')
@xs.process
class SetVar:
foo_nonzero = xs.foreign(ConsumeVar, 'foo_nonzero', intent='out')
foo_float = xs.foreign(ConsumeVar, 'foo_float', intent='out')
foo_int_lst = xs.foreign(ConsumeVar, 'foo_int_lst', intent='out')
def initialize(self):
self.foo_int_lst = [0, 2, 3, 4, 5]
self.foo_nonzero = [1, 2, 3, 4, 5]
self.foo_float = float(0.)
@pytest.fixture
def model_result():
"""run the model"""
model = xs.Model({
'set_var': SetVar,
'consume_var': ConsumeVar,
})
in_ds = xs.create_setup(
model=model,
clocks=dict(step=range(10)),
output_vars=dict(
set_var__foo_nonzero='step',
set_var__foo_float='step',
set_var__foo_int_lst='step',
)
)
return in_ds.xsimlab.run(model=model)
def test_model_returns_dataset(model_result):
assert isinstance(model_result, xr.Dataset)
def test_int_lst_result(model_result):
"""Only this pytest fails starting at b3e7e93"""
# int list
r = model_result['set_var__foo_int_lst']
assert not r.isnull().any()
def test_float_result(model_result):
# numpy arange
r = model_result['set_var__foo_float']
assert [val == 0 for val in r.values]
def test_int_lst_nonzero_result(model_result):
# variable is array without zeroes
r = model_result['set_var__foo_nonzero']
assert not r.isnull().any()
Just noticed the release note updates. Should we expect a fix by the v0.5 release?
Passing decoding=dict(mask_and_scale=False)
to Dataset.xsimlab.run
passes all pytests on/before 84cabf6.
Sorry @eho-tacc I missed your issue here. Indeed you can use decoding=dict(mask_and_scale=False)
to "opt-out" this new behavior. Honestly I'm still wondering what would be the default option here.
-
mask_and_scale=True
by default is consistent withxarray.open_zarr()
, but zeros implicitly converted to NaN annoys me too, as I often deal withint
variables with zeros as valid data values. I wouldn't mind settingxs.variable(encoding={'fill_value': -1})
for such variables, but still if there's no fill value in the data, in the output dataset thedtype
gets converted fromint
tofloat
, which is annoying. -
mask_and_scale=False
leaves a_FillValue
attribute in the output Dataset's data variables, which may cause troubles when saving the dataset to a netcdf file. That's annoying too.
Either way, I think this should be improved directly in Xarray and/or Zarr.