Masking not working properly with _Unsigned
dopplershift opened this issue · 14 comments
I have a Level 2 QPE product from GOES-16 that caused some support issues. The relevant CDL is:
netcdf satellite/goes16/GOES16/Products/RainRateQPE/FullDisk/current/OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc {
dimensions:
y = 5424;
x = 5424;
number_of_time_bounds = 2;
band = 1;
number_of_image_bounds = 2;
number_of_sunglint_angle_bounds = 2;
number_of_LZA_bounds = 2;
number_of_SZA_bounds = 2;
number_of_lat_bounds = 2;
number_of_rainfall_rate_bounds = 2;
variables:
short RRQPE(y=5424, x=5424);
:_FillValue = -1S; // short
:long_name = "ABI L2+ Rainfall Rate - Quantitative Prediction Estimate";
:standard_name = "rainfall_rate";
:_Unsigned = "true";
:valid_range = 0S, -6S; // short
:scale_factor = 0.00152602f; // float
:add_offset = 0.0f; // float
:units = "mm h-1";
:resolution = "y: 0.000056 rad x: 0.000056 rad";
:coordinates = "latitude retrieval_local_zenith_angle quantitative_local_zenith_angle solar_zenith_angle t y x";
:grid_mapping = "goes_imager_projection";
:cell_methods = "latitude: point (good quality pixel produced) retrieval_local_zenith_angle: point (good or degraded quality pixel produced) quantitative_local_zenith_angle: sum (good quality pixel produced) solar_zenith_angle: sum (good quality pixel produced) t: point area: point";
:ancillary_variables = "DQF";
Note the values in valid_range
; the values themselves are appropriate for a signed data type, but they only make sense as a range if you convert signed (-6) to unsigned (65530). The values in valid_range
are not incorrect though, as the standards specify that the values need to be the same type as the variable.
The current out of the box behavior is that netCDF4-python returns an entirely masked variable. The work-around is to disable masking.
The correct behavior IMO is to have valid_range and friends be handled like the data values for unsigned purposes.
I've included the sample file.
I'm traveling so I won't be able to look at this till next week. Have you tried the latest master?
The valid range is assumed to be of the same type as the netcdf variable (signed short integer) and the conversion to unsigned short is considered to be part of the scale/offset operation (a numpy view is created after the mask is created).
In this case, valid_range
is the same type as the variable. The problem is that valid_range is given as: (0, -6). These are the same (and correct) bit pattern regardless of signed/unsigned. The problem is that for the original signed data, masking values < 0 and >-6 produces useless results, whereas doing the same operation for the unsigned data, masking <0 and > 66530, produces the desired results.
Yes, but isn't the valid_range
(also missing_value
, _FillValue
) supposed to apply to the native variable data, which in this case is signed?
We are currently treating the _Unsigned
attribute as part of the scaling operation, after the masking is applied.
Hmmm...I just found this in the netCDF User's Guide under Best Practices:
If the variable is unsigned the valid_range values should be widened if needed and stored as unsigned integers.
@lesserwhirls Does netCDF-java handle valid_range
? If so, what does it do with _Unsigned
combined with valid_range
?
@dopplershift - yes, netCDF-java tries to deal with valid_range
. I'm not sure of the details, as the code has changed between 4.6.x
and 5.0
. @cwardgar was in that code recently to deal with _FillValue
, so he may have the best understanding at this point.
Does netCDF-java handle valid_range? If so, what does it do with _Unsigned combined with valid_range?
Yes it does. First, it widens valid_range
to the next largest integral type. This allows a bit pattern which previously may have been interpreted as negative (because e.g. we're storing an unsigned short
in a short
) to be properly interpreted as a non-negative number.
Then, it applies scale and offset. The result will be a double
. For the dataset you provided, NJ calculates valid_min == 0
and valid_max == 100.00009070616215
. That seems correct, yeah?
Does netCDF-java do the same with _FillValue
and missing_value
? (cast to the larger integral type)
@jswhit Yes, missing_value
is widened first. _FillValue
is not! That's likely a bug. Thanks for pointing that out.
And to be clear, valid_*
and missing_value
are widened before scale/offset are applied, not merely cast. For example:
short s = -6;
System.out.println((int) s); // Cast: -6
System.out.println(s & 0xffff); // Widen: 65530
The problem that I see with _FillValue
is that it is being cast (to double) before scale/offset right now, not widened.
With the changes in pull request #797, the following script
from netCDF4 import Dataset
import matplotlib.pyplot as plt
nc=Dataset('OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc')
data = nc['RRQPE'][:]
print data.dtype, data.min(), data.max()
plt.imshow(data,cmap=plt.cm.jet,vmin=0,vmax=100)
plt.colorbar()
plt.show()
produces
float32 0.0 100.00009
and the attached png file.
Can someone try this with netcdf-java and see if they get the same?
That's what I get using toolsUI
.
I have tried to output the valid_range of the dataset, and still got [0, -6]. Are they supposed to be [0, 100] or [0.0, 100.0] ?
from netCDF4 import Dataset
nc=Dataset('OR_ABI-L2-RRQPEF-M3_G16_s20181072300427_e20181072311194_c20181072311280.nc')
data = nc['RRQPE'][:]
print data.dtype, data.min(), data.max()
print nc['RRQPE'].getncattr('valid_range')
float32 0.0 100.00009
[ 0, -6]
The fix does not have the library change the valid_range
attribute--it only fixed the automatic masking to use the proper data. IMO, changing attributes is outside the scope here.