NOAA-GFDL/MDTF-diagnostics

Blank AssertionErrors and problem with parsing scalar coordinates in input files (reconcile_scalar_value_and_units)

Closed this issue · 3 comments

Bug Severity

  • 1 = Minor problem that does not affect total framework functionality (e.g., computation error in a POD, problem with logging output, or an issue on a single system
  • 2 = Major problem that affects overall functionality, but that does not occur for all users (e.g., problems installing the framework with a specific Conda version, a framework option that causes one or more PODs to fail, or missing/incompatible Python modules).
  • 3 = Catastrophic problem that occurs frequently for multiple users and/or on multiple systems (e.g.,framework consistently fails to install on multiple systems, or one or more PODs continuously fails after running successfully)

Describe the bug
In testing a "POD-in-progress", I was encountering issues with my netcdf files that had scalar pressure level coordinates that seemingly fit the bill for the settings.jsonc file of my POD. Specifically, the metadata parsing was failing and I was getting blank AssertionErrors like the following:

Received event while preprocessing <#nZtQ:stc_vert_wave_coupling.ta50>: DataPreprocessEvent("Caught exception while parsing file metadata for <#nZtQ:stc_vert_wave_coupling.ta50>: AssertionError().")

The blank AssertionError wasn't very helpful, so I had to do a bit of digging in the codebase; the "Caught exception while parsing file metadata" string led me to src/preprocessor.py and the code surrounding that led me to src/xr_parser.py where I found a handful of assert statements without any assertion messages.

By adding some dummy assertion messages to each of these, I discovered the parsing was failing at the first assertion in reconcile_scalar_value_and_units, which is called by the reconcile_scalar_coords method. Specifically the call

self.reconcile_scalar_value_and_units(our_var, ds[ds_coord_name])

was leading to the failure of the following assertion

assert (hasattr(our_var, 'is_scalar') and our_var.is_scalar)

However, in reading the docstrings for each of the methods, I noticed that our_var for reconcile_scalar_coords is intended to be a TranslatedVarlistEntry object, whereas our_var for reconcile_scalar_value_and_units is intended to be data_model.DMVariable object.

I changed

self.reconcile_scalar_value_and_units(our_var, ds[ds_coord_name])

to

self.reconcile_scalar_value_and_units(coord, ds[ds_coord_name])

and the subsequent parsing seemed to complete normally, but I am not confident enough in my knowledge of the preprocessor internals to know whether this is the correct fix.

As a more general point, I think it would be helpful if the assertions in xr_parser.py had messages added to them for purposes of debugging.

@zdlawrence Thank your for all of your preliminary sleuthing. If you could provide the sample data you are using, the branch with the latest version of your POD (doesn't have to working--it'll just provide information for the preprocessor_ and a copy of your runtime config file, I can do some more digging to see what the exact issue is and submit a fix.

The assertion statements were implemented for testing by the previous developer, and are unfortunately one of many sources of vague error statements. I will add cleaning up the statements to my todo list.

@wrongkindofdoctor Please see https://github.com/zdlawrence/MDTF-diagnostics/tree/feature/stc_vert_wave_coupling and grab the tarball "data-for-jess.tgz" from the anonymous FTP at ftp.cdc.noaa.gov/pub/Public/zlawrence/

My POD code currently runs to completion with the above data if I include that line change mentioned in my initial comment.

For some additional perspective, I can't say for sure whether this is an actual bug in the mdtf parser, or whether it's potentially a xarray/data issue. The files I'm working with were originally downloaded from the CMIP6 google cloud archive with xarray, and the fields on individual pressure levels were essentially downloaded like so:

ds = ds.sel(plev=5000)
ds.to_netcdf(...)

This results in xarray netcdf files that look like so:

<xarray.Dataset>
Dimensions:    (lat: 64, bnds: 2, lon: 128, time: 13140)
Coordinates:
  * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
    lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
  * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
    lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
    plev       float64 5e+03
  * time       (time) object 1979-01-01 12:00:00 ... 2014-12-31 12:00:00
    time_bnds  (time, bnds) object 1979-01-01 00:00:00 ... 2015-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    ta50       (time, lat, lon) float32 237.6 237.6 237.6 ... 209.5 209.4 209.3

However, compare that to the netcdf file that mdtf creates and places in the POD directory (with the aforementioned line change to xr_parser.py):

<xarray.Dataset>
Dimensions:    (lat: 64, bnds: 2, lon: 128, time: 13140)
Coordinates:
  * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
    lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
  * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
    lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
  * time       (time) object 1979-01-01 12:00:00 ... 2014-12-31 12:00:00
    time_bnds  (time, bnds) object 1979-01-01 00:00:00 ... 2015-01-01 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lev        float64 50.0
    ta50       (time, lat, lon) float32 237.6 237.6 237.6 ... 209.5 209.4 209.3

Aside from the rename/unit conversion, I wonder if it's a difference in how a scalar coordinate is supposed to look/be handled in the xarray dataset. ncks shows that both files look virtually identical including that the ta50 variable in both files have a coordinates attribute that points to the relevant scalar variable name, similar to as described here: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#scalar-coordinate-variables

When I was originally debugging this issue, I found that

self.reconcile_scalar_value_and_units(our_var, ds[ds_coord_name])

was passing in (for example) ta50 as our_var which has no isscalar attribute as a TranslatedVarlistEntry object, leading to the failure of the assert.

@zdlawrence Thanks again for your detailed analysis of the problem. After digging through the code a bit more and making sure that there wasn't an is_scalar property that wasn't getting initialized, your solution is probably correct. I don't know why our_var was passed to reconcile_scalar_value_and_coords (e.g., it's a typo, since the rest of the code in the block deals with coordinate data, an unanticipated bug exposed by your test data, or something new in this particular version of xarray). But it works with your POD, and passes the CI, so it'll do. Go ahead and sync your branches with main to get the update.