NaN/Fill Values Being Plotted
Closed this issue · 3 comments
I do not know which part of the framework stack this issue belongs, but I have encountered some peculiar behavior with NaNs/fill-values. Through testing, it appears that one of the APIs has changed behavior with NaNs and fill-values, as re-running the tavg-prec-month dataset with latest versions of everything results in a different visualization of the data. It leads me to believe that expected behavior in either the raster API, xarray, zarr, or ndpyramid has changed or settings for fillValue are being discarded.
Expected Behavior:
Running the demo with the data hosted at https://storage.googleapis.com/carbonplan-maps/v2/demo/4d/tavg-prec-month, the map produces the expected visualization where data in NaN/fill-value regions are transparent/not plotted.
Unexpected Behavior:
I wanted to test the data pipeline, so I used the example notebook from ndpyramid to create my own version of the same dataset. When I run the 4D, multi variable sample this corresponds too, I successfully create the zarr pyramid, but the displayed data now shows maximum value colors in NaN/fill-value regions. My hosted version of the duplicate dataset can be found here for testing purposes/reproducibility.
import xarray as xr
import numpy as np
import pandas as pd
import rioxarray
from ndpyramid import pyramid_reproject
from carbonplan_data.utils import set_zarr_encoding
from carbonplan_data.metadata import get_cf_global_attrs
VERSION = 2
LEVELS = 6
PIXELS_PER_TILE = 128
input_path = f"gs://carbonplan-maps/v{VERSION}/demo/raw"
save_path = f"/mrms-tiledata/test-zarr"
# open and extract the input datasets
ds1_all = []
ds2_all = []
months = list(map(lambda d: d + 1, range(12)))
for i in months:
path = f"{input_path}/wc2.1_2.5m_tavg_{i:02g}.tif"
ds = (
xr.open_dataarray(path, engine="rasterio")
.to_dataset(name="climate")
.squeeze()
.reset_coords(["band"], drop=True)
)
ds1_all.append(ds)
ds1 = xr.concat(ds1_all, pd.Index(months, name="month"))
for i in months:
path = f"{input_path}/wc2.1_2.5m_prec_{i:02g}.tif"
ds = (
xr.open_dataarray(path, engine="rasterio")
.to_dataset(name="climate")
.squeeze()
.reset_coords(["band"], drop=True)
)
ds2_all.append(ds)
ds2 = xr.concat(ds2_all, pd.Index(months, name="month"))
ds1["month"] = ds1["month"].astype("int32")
ds2["month"] = ds2["month"].astype("int32")
ds2["climate"] = ds2["climate"].astype("float32")
ds2["climate"].values[ds2["climate"].values == ds2["climate"].values[0, 0, 0]] = ds1[
"climate"
].values[0, 0, 0]
ds = xr.concat([ds1, ds2], pd.Index(["tavg", "prec"], name="band"))
ds["band"] = ds["band"].astype("str")
# create the pyramid
dt = pyramid_reproject(ds, levels=LEVELS, extra_dim="band")
for child in dt.children:
dt[child].ds = set_zarr_encoding(
dt[child].ds, codec_config={"id": "zlib", "level": 1}, float_dtype="float32"
)
dt[child].ds = dt[child].ds.chunk({"x": PIXELS_PER_TILE, "y": PIXELS_PER_TILE, "band": 2, "month": 12})
dt[child].ds["climate"].attrs.clear()
dt.attrs = get_cf_global_attrs(version=VERSION)
for level in range(LEVELS):
slevel = str(level)
dt.ds.attrs['multiscales'][0]['datasets'][level]['pixels_per_tile'] = PIXELS_PER_TILE
dt.ds.attrs['multiscales'][0]['metadata']['version'] = VERSION
# write the pyramid to zarr
dt.to_zarr(save_path + "4d/tavg-prec-month", consolidated=True)
dt.ds.attrs
Hi @keltonhalbert, thanks for opening this issue -- it has pointed us toward to a number of changes to make!
- The "expected behavior" you were seeing was the result of a bug in that version of the demo data, where a negative value for
fill_value
was getting used in practice (though not getting reported properly in the metadata). Because of the color range used in the demo, the negative value was rendering in the background color, which masked a bug in the demo client-side code that we were not using thefillValue
prop. We've since regenerated the data and configured afillValue
that matches the value now properly used byndpyramid
. If you use the data you generated with aRaster
that configuresfillValue
, I think your issue should be resolved. - This reminded us that we had intended to consume
fill_value
from the pyramid metadata to avoid this extra client-side configuration step. I've just merged a change (#76) and cut a new release,2.x.x
that does this. If you upgrade to this version, this should resolve your issue without having to go through that configuration step.
Thanks again for opening! I'll go ahead and close this out next week unless you're still running into related issues.
Thank you for the quick response and fast turnaround! We will give the new release a try and I will close this if we can confirm it's working.
Gonna go ahead and close this out -- feel free to reopen if the issue persists!