Dimension operator adds undesired `_FillValue attribute` to boundary variables
Closed this issue · 2 comments
BorjaEst commented
Context
Given the following example dataset:
$ ncdump -h toz.CF-1.8.nc
netcdf toz.CF-1.8 {
dimensions:
time = 12 ;
bounds2 = 2 ;
lat = 18 ;
lon = 36 ;
variables:
double time_bnds(time, bounds2) ;
double time(time) ;
time:standard_name = "time" ;
time:units = "days since 2000-1-1" ;
time:calendar = "gregorian" ;
time:bounds = "time_bnds" ;
double lat_bnds(lat, bounds2) ;
double lat(lat) ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
lat:bounds = "lat_bnds" ;
double lon_bnds(lon, bounds2) ;
double lon(lon) ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
lon:bounds = "lon_bnds" ;
double toz(time, lat, lon) ;
toz:standard_name = "atmosphere_mole_content_of_ozone" ;
toz:long_name = "Total Ozone Column" ;
toz:units = "mol m-2" ;
toz:cell_methods = "area: mean time: maximum (interval: 1 day)" ;
// global attributes:
:Conventions = "CF-1.8" ;
:title = "Ozone data mockup" ;
:comment = "Ozone mockup data generated for testing purposes" ;
:institution = "o3skim code" ;
:history = "File created on 2022-02-17 10:14:04.458982" ;
:source = "Random generation" ;
}
Which is correct:
$ cfchecks -v 1.8 toz.CF-1.8.nc
CHECKING NetCDF FILE: toz.CF-1.8.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 78 (2021-09-21T11:55:06Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)
------------------
Checking variable: time_bnds
------------------
------------------
Checking variable: time
------------------
------------------
Checking variable: lat_bnds
------------------
------------------
Checking variable: lat
------------------
------------------
Checking variable: lon_bnds
------------------
------------------
Checking variable: lon
------------------
------------------
Checking variable: toz
------------------
ERRORS detected: 0
WARNINGS given: 0
INFORMATION messages: 0
Issue
Generates a wrong output after applying an operation, for example mean
:
>>> ds = xr.load_dataset("toz.CF-1.8.nc")
>>> ds.cf.mean("latitude").to_netcdf("toz-reduced.CF-1.8.nc")
$ cfchecks -v 1.8 toz-reduced.CF-1.8.nc
CHECKING NetCDF FILE: toz-reduced.CF-1.8.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 78 (2021-09-21T11:55:06Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)
ERROR: (7.1): boundary variable with non-numeric data type
------------------
Checking variable: time_bnds
------------------
------------------
Checking variable: time
------------------
------------------
Checking variable: lat_bnds
------------------
WARN: (3): No standard_name or long_name attribute specified
INFO: (3.1): No units attribute set. Please consider adding a units attribute for completeness.
------------------
Checking variable: lon_bnds
------------------
WARN: (7.1): Boundary Variable lon_bnds should not have _FillValue attribute
------------------
Checking variable: lon
------------------
------------------
Checking variable: toz
------------------
ERRORS detected: 1
WARNINGS given: 2
INFORMATION messages: 1
Aditional information
For additional information, here are the reduced netCDF headers:
$ ncdump -h toz-reduced.CF-1.8.nc
netcdf toz-reduced.CF-1.8 {
dimensions:
time = 12 ;
bounds2 = 2 ;
lon = 36 ;
variables:
int64 time_bnds(time, bounds2) ;
double time(time) ;
time:_FillValue = NaN ;
time:standard_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 2000-01-01" ;
time:calendar = "gregorian" ;
double lat_bnds(bounds2) ;
lat_bnds:_FillValue = NaN ;
double lon_bnds(lon, bounds2) ;
lon_bnds:_FillValue = NaN ;
double lon(lon) ;
lon:_FillValue = NaN ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
lon:bounds = "lon_bnds" ;
double toz(time, lon) ;
toz:_FillValue = NaN ;
toz:standard_name = "atmosphere_mole_content_of_ozone" ;
toz:long_name = "Total Ozone Column" ;
toz:units = "mol m-2" ;
toz:cell_methods = "area: mean time: maximum (interval: 1 day)" ;
// global attributes:
:Conventions = "CF-1.8" ;
:title = "Ozone data mockup" ;
:comment = "Ozone mockup data generated for testing purposes" ;
:institution = "o3skim code" ;
:history = "File created on 2022-02-17 10:14:04.458982" ;
:source = "Random generation" ;
}
And here the cf_xarray version information:
$ pip show cf_xarray
Name: cf-xarray
Version: 0.6.1
Summary: A lightweight convenience wrapper for using CF attributes on xarray objects.
Home-page: https://cf-xarray.readthedocs.io
Author: None
Author-email: None
License: Apache
Location: /home/borja/miniconda3/envs/o3as/lib/python3.8/site-packages
Requires: pandas, numpy, setuptools
Required-by:
dcherian commented
This is an xarray issue: pydata/xarray#2037 It would be useful if you could comment there.
BorjaEst commented
I see, that is quite painful. I am posting here a temporal solution to this in case someone has the same issue.
Based on https://stackoverflow.com/questions/45693688/xarray-automatically-applying-fillvalue-to-coordinates-on-netcdf-output
for var in ds.variables:
if '_FillValue' not in ds[var].attrs:
ds[var].attrs['_FillValue'] = False
Also possible to run it like:
for var in variables.keys():
if variables[var].dims == (var,):
variables[var].attrs["_FillValue"] = None
if "bounds" in variables[var].attrs:
bnds = variables[var].attrs['bounds']
variables[bnds].attrs["_FillValue"] = None