pytroll/satpy

cf_writer adds a _FillValue = NaN to coordinate variables

Opened this issue · 1 comments

The cf_writer adds a _FillValue to the final netCDF output for the lat/lon coordinates. In this case, adapted from an abi_fixed_grid, the dims are ("y", "x") while the coords are ("latitude", "longitude")
even so, the coordinates should not contain a _FillValue upon writing to the netCDF file.
pydata/xarray#1598

Reproduce:

import dask.array as da
import os
import xarray as xr

from pyresample import AreaDefinition
from satpy import Scene

area_id = "abi_geos"
description = "Strided ABI L2 File Area"
proj_id = "abi_geos"
projection = {'ellps': 'GRS80', 'h': '35786023', 'lon_0': '-75', 'no_defs': 'None', 'proj': 'geos', 'sweep': 'x',
              'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
width = 100
height = 100
area_extent = (-3627271.341, 1583173.822, 1382771.9518, 4589199.7649)
area_def = AreaDefinition(area_id, description, proj_id, projection, width, height, area_extent)
lonlats = area_def.get_lonlats()


scn = Scene()

scn["test"] = xr.DataArray(data=da.zeros((100, 100)), dims=("y", "x"), attrs={"name": "test", "area": area_def})

cf_ds = scn.to_xarray()
print(f"Longitude Encoding: {cf_ds['longitude'].encoding}")
print(f"Longitude Attrs: {cf_ds['longitude'].attrs}")
print(f"Longitude Min: {lonlats[0].min()}, Latitude Min: {lonlats[1].min()}")
print(f"Longitude Max: {lonlats[0].max()}, Latitude Max: {lonlats[1].max()}")
outpath = os.path.join(os.path.expanduser("~"), "test.nc")
scn.save_datasets(filename=outpath)

Though the original attributes do not contain a _FillValue, the resulting netCDF does, the code above prints:

Longitude Encoding: {}
Longitude Attrs: {'name': 'longitude', 'standard_name': 'longitude', 'units': 'degrees_east'}
Longitude Min: -147.87682093113054, Latitude Min: 14.704946854219488
Longitude Max: inf, Latitude Max: inf

but an ncdump -h on test.nc shows the addition of the _FillValue:

	double longitude(y, x) ;
		longitude:_FillValue = NaN ;
		longitude:name = "longitude" ;
		longitude:standard_name = "longitude" ;
		longitude:units = "degrees_east" ;
	double latitude(y, x) ;
		latitude:_FillValue = NaN ;
		latitude:name = "latitude" ;
		latitude:standard_name = "latitude" ;
		latitude:units = "degrees_north" ;
	double test(y, x) ;
		test:_FillValue = NaN ;
		test:grid_mapping = "abi_geos" ;
		test:long_name = "test" ;
		test:coordinates = "latitude longitude" ;

However, it is possible to add encoding to save_datasets to save the lat/lon without _FillValue:
scn.save_datasets(filename=test.nc, encoding={"latitude": {"_FillValue": None}, "longitude": {"_FillValue": None}})

		longitude:name = "longitude" ;
		longitude:standard_name = "longitude" ;
		longitude:units = "degrees_east" ;
	double latitude(y, x) ;
		latitude:name = "latitude" ;
		latitude:standard_name = "latitude" ;
		latitude:units = "degrees_north" ;
	double test(y, x) ;
		test:_FillValue = NaN ;
		test:grid_mapping = "abi_geos" ;
		test:long_name = "test" ;
		test:coordinates = "latitude longitude" 

It seems that the encoding for writing netCDF data should include this somewhere in save_datasets rather than being typed explicitly.

I'm not sure I agree that there should be no _FillValue. The argument in this xarray issue (pydata/xarray#1865) if I remember correctly is more about matching the CF standard. In CF a coordinate variable is a 1D variable that matches the name of a dimension. As we discussed on slack, it makes sense (as mentioned in the xarray issue about the CF standard) that you can't have fill values on a coordinate 1D variable. You can't have a pixel of data that has a "location" of (NaN, NaN). It just doesn't make sense. BUT our 2D lon/lats in CF-land I don't think are technically coordinate variables at least as far as the missing value concern...is concerned. This section of the CF docs makes me think that it is expected:

http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#reduced-horizontal-grid

Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables.

So on this page:

http://cfconventions.org/cf-conventions/cf-conventions.html#terminology

auxiliary coordinate variable

Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).

Which is used:

http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data

Missing data is allowed in data variables and auxiliary coordinate variables. Generic applications should treat the data as missing where any auxiliary coordinate variables have missing values; special-purpose applications might be able to make use of the data. Missing data is not allowed in coordinate variables.

So if our 2D lon/lats are considered "auxiliary" then they're fine to have a _FillValue in CF.

Anyway, my opinion is that the lon/lat 2D arrays should have a _FillValue or at the very least a valid_range.