cf_writer adds a _FillValue = NaN to coordinate variables
Opened this issue · 1 comments
The cf_writer adds a _FillValue to the final netCDF output for the lat/lon coordinates. In this case, adapted from an abi_fixed_grid, the dims are ("y", "x")
while the coords are ("latitude", "longitude")
even so, the coordinates should not contain a _FillValue upon writing to the netCDF file.
pydata/xarray#1598
Reproduce:
import dask.array as da
import os
import xarray as xr
from pyresample import AreaDefinition
from satpy import Scene
area_id = "abi_geos"
description = "Strided ABI L2 File Area"
proj_id = "abi_geos"
projection = {'ellps': 'GRS80', 'h': '35786023', 'lon_0': '-75', 'no_defs': 'None', 'proj': 'geos', 'sweep': 'x',
'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
width = 100
height = 100
area_extent = (-3627271.341, 1583173.822, 1382771.9518, 4589199.7649)
area_def = AreaDefinition(area_id, description, proj_id, projection, width, height, area_extent)
lonlats = area_def.get_lonlats()
scn = Scene()
scn["test"] = xr.DataArray(data=da.zeros((100, 100)), dims=("y", "x"), attrs={"name": "test", "area": area_def})
cf_ds = scn.to_xarray()
print(f"Longitude Encoding: {cf_ds['longitude'].encoding}")
print(f"Longitude Attrs: {cf_ds['longitude'].attrs}")
print(f"Longitude Min: {lonlats[0].min()}, Latitude Min: {lonlats[1].min()}")
print(f"Longitude Max: {lonlats[0].max()}, Latitude Max: {lonlats[1].max()}")
outpath = os.path.join(os.path.expanduser("~"), "test.nc")
scn.save_datasets(filename=outpath)
Though the original attributes do not contain a _FillValue, the resulting netCDF does, the code above prints:
Longitude Encoding: {}
Longitude Attrs: {'name': 'longitude', 'standard_name': 'longitude', 'units': 'degrees_east'}
Longitude Min: -147.87682093113054, Latitude Min: 14.704946854219488
Longitude Max: inf, Latitude Max: inf
but an ncdump -h on test.nc shows the addition of the _FillValue:
double longitude(y, x) ;
longitude:_FillValue = NaN ;
longitude:name = "longitude" ;
longitude:standard_name = "longitude" ;
longitude:units = "degrees_east" ;
double latitude(y, x) ;
latitude:_FillValue = NaN ;
latitude:name = "latitude" ;
latitude:standard_name = "latitude" ;
latitude:units = "degrees_north" ;
double test(y, x) ;
test:_FillValue = NaN ;
test:grid_mapping = "abi_geos" ;
test:long_name = "test" ;
test:coordinates = "latitude longitude" ;
However, it is possible to add encoding to save_datasets to save the lat/lon without _FillValue:
scn.save_datasets(filename=test.nc, encoding={"latitude": {"_FillValue": None}, "longitude": {"_FillValue": None}})
longitude:name = "longitude" ;
longitude:standard_name = "longitude" ;
longitude:units = "degrees_east" ;
double latitude(y, x) ;
latitude:name = "latitude" ;
latitude:standard_name = "latitude" ;
latitude:units = "degrees_north" ;
double test(y, x) ;
test:_FillValue = NaN ;
test:grid_mapping = "abi_geos" ;
test:long_name = "test" ;
test:coordinates = "latitude longitude"
It seems that the encoding for writing netCDF data should include this somewhere in save_datasets rather than being typed explicitly.
I'm not sure I agree that there should be no _FillValue
. The argument in this xarray issue (pydata/xarray#1865) if I remember correctly is more about matching the CF standard. In CF a coordinate variable is a 1D variable that matches the name of a dimension. As we discussed on slack, it makes sense (as mentioned in the xarray issue about the CF standard) that you can't have fill values on a coordinate 1D variable. You can't have a pixel of data that has a "location" of (NaN, NaN). It just doesn't make sense. BUT our 2D lon/lats in CF-land I don't think are technically coordinate variables at least as far as the missing value concern...is concerned. This section of the CF docs makes me think that it is expected:
http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#reduced-horizontal-grid
Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables.
So on this page:
http://cfconventions.org/cf-conventions/cf-conventions.html#terminology
auxiliary coordinate variable
Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).
Which is used:
http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data
Missing data is allowed in data variables and auxiliary coordinate variables. Generic applications should treat the data as missing where any auxiliary coordinate variables have missing values; special-purpose applications might be able to make use of the data. Missing data is not allowed in coordinate variables.
So if our 2D lon/lats are considered "auxiliary" then they're fine to have a _FillValue
in CF.
Anyway, my opinion is that the lon/lat 2D arrays should have a _FillValue
or at the very least a valid_range
.