NOAA-GFDL/MDTF-diagnostics

Having difficulty getting ocean model output to load into MDTF; coordinates issue?

Closed this issue · 14 comments

I've been having difficulty getting the framework to read in a standard CESM POP output file (ocean model output). I suspect that it is because the ocean model coordinates are 2-dimensional: i.e., latitude = f (x,y) and latitude = f(x,y). @jkrasting looked through the code I had created so far on my fork+branch here:
https://github.com/emaroon/MDTF-diagnostics/tree/development)
He adjusted the settings.jsonc a smidge: https://github.com/jkrasting/MDTF-diagnostics/tree/emaroon-development

I think part of my issue is that the code isn't finding the file and opening it correctly. I've attached a screenshot of what that looks like:

Screenshot 2023-05-23 at 9 36 09 PM

I think John was able to get back to the coordinate issue using the same exact file that I am, so hopefully he can chime in here with the error he has (in case it is different from mine). Thanks in advance for your help with this!

Hi @emaroon. Thanks for opening this issue. Yes, in addition to the adjustments in your settings.jsonc file, I also needed to adjust the file structure in in the inputdata directory. This worked for me:

inputdata/
├── model
│   └── DUMMY
│       └── mon
│           └── DUMMY.SST.mon.nc
└── obs_data
    └── natl_ocean

This allows the framework to find your file. However, it still brings us back to the issue with 2-dimensional coordinates. When I run the dummy POD, this is the error that results:

./mdtf -f src/default_tests.jsonc 

...

Starting data file search at /net3/jpk/MDTF/inputdata/model:
Directory crawl found 1 files.
Querying <SST> (Potential Temperature).
Fetching <#bd2V:natl_ocean.tos> (='SST' @ 1mo).
Preprocessing <#bd2V:natl_ocean.tos> (='SST' @ 1mo).
ERROR: cf_xarray error: couldn't assign ['nlat', 'nlon'] to axes for SST(assigned axes: {'X': ['TLONG'], 'Y': ['TLAT'], 'Z': ['z_t'], 'T': ['time']})
Received event while preprocessing <#bd2V:natl_ocean.tos>: DataPreprocessEvent("Caught exception while Preprocessing on <#bd2V:natl_ocean.tos> failed at CropDateRangeFunction: TypeError('Missing axes for SST.').")
Deactivated <#bd2V:natl_ocean.tos> due to ChildFailureEvent("Deactivating <#bd2V:natl_ocean.tos> due to failure of all child objects.").
Request for <#bd2V:natl_ocean.tos> (='SST' @ 1mo) failed; looking for alternate data.
ERROR: Deactivated <#b6uv:DUMMY.natl_ocean> due to PodDataError("Requested data not available for POD <#b6uv:DUMMY.natl_ocean>: No alternate data available for <#bd2V:natl_ocean.tos>.").
ERROR: Deactivated <#8zRK:DUMMY> due to ChildFailureEvent("Deactivating <#8zRK:DUMMY> due to failure of all child objects.").
Received event at DataSource level: DataQueryEvent("Too many iterations in select_data() for <#8zRK:DUMMY>.").
### MDTFFramework: Data request for case 'DUMMY' failed; skipping execution.

I tried the quick fix with the --disable_preprocessor flag, but it produced the same error. @wrongkindofdoctor - can you help us figure it out? (I can point you to my directory on the GFDL system if it would help ... LMK)

Thanks!

@jkrasting @emaroon The --disable_preprocessor option still runs the cropDateRangeFunction and RenameVariablesFunction methods. The dummy dataset you are using, in addition to having a variable long_name attribute that is not included in the current NCAR fieldlist, has coordinates attributes that do not match the variable dimensions. This is causing in issues in the associated cf.xarray assessor routines in the xr_parser module that I still don't entirely understand. In other words, this is not a quick fix.

The good news: you can bypass the preprocessor entirely by setting the "data_type" to "no_pp" in your default_tests.jsonc file. This runs your test POD (though not without errors), and will allow you to debug your script.

Thanks for the quick response! I will move forward for now with no_pp in the default_tests.jsonc to get things working in the test PODs for now.

The dummy file I'm using is in standard CESM ocean (POP) output format, so eventually figuring out how to get the framework to agree with that format will be needed, though that can probably wait until the planned preprocessor changes. Would it be helpful to send you standard 2D and 3D POP CESM example files? Hopefully now I can also quickly get dummy PODs that do simple diagnosis on both 2D and 3D fields.

Yes, thanks, @wrongkindofdoctor. I agree with @emaroon - indeed, I believe the framework assumes that coordinates have to also be dimension variables. This is often not the case with ocean data.

@emaroon Yes, sample POP output would be very helpful for crafting a fix for the current framework, and for the redesign development.

@emaroon @jkrasting I'm working on a fix the preprocessor to allow multiple coordinates in the cropDateRangeFunction and RenameVariablesFunctions. I will also add potential temperature as defined in the dummy ocean files to the NCAR fieldlist, so you can now run the natl_ocean POD with the --disable-preprocessor function.

Note that you will have to update the dimensions in the natl_ocean settings.jsonc file to match the definition in the NCAR fieldlist as follows:

"dimensions": {
     "time": {"standard_name": "time"},
     "lat": {"standard_name": "latitude"},
     "lon": {"standard_name": "longitude"}
  },

You also have to change the dimensions for tos from geolat to lat and geolon to lon
"dimensions": ["time", "lat", "lon"]

I was able to run the test POD until it hit errors in the environment variable definitions in the driver script (looks like "tas_var" needs to be changed to "tos_var" in a few places) and in the paths defined in the html template, which need to be updated to the output file paths defined in the driver script.

Thanks @wrongkindofdoctor

In the dataset @emaroon provided, the latitude and longitude variables needed for computation and plotting are TLAT and TLONG

@jkrasting @emaroon Okay, I modified the NCAR fieldlist to include TLAT and TLONG coordinates, and the framework to be less stringent when it queries the settings and fieldlists for standard_names (or long_names) to associate with X,Y, and T coordinates (i.e., they only have to contain "latitude", "longitude", and "time", respectively, not be an exact match).

The dimensions and tos variable dimension entries in the settings.jsonc look like:

 "dimensions": {
     "time": {"standard_name": "time"},
     "TLAT": {"standard_name": "array of t-grid latitudes"},
     "TLONG": {"standard_name": "array of t-grid longitudes"}
  },
...
 "dimensions": ["time", "TLAT", "TLONG"]

This does not accommodate all possible coordinates, but should be serviceable for integrating the natl_ocean POD. Pull in the latest main branch updates when you are ready, and follow up if (when) you encounter other problems.

Hi @wrongkindofdoctor, I've bashed my head on this some more after integrating your modifications to the NCAR fieldlist and trying a few more things but am now again at an impasse. Any help that you can provide would be helpful. Here's what I tried:

Using your fieldlist modifications with TLAT and TLONG right out of the box yields an interesting warning:
WARNING: Variable SST has unexpected dimensionality: expected axes ['T', 'X', 'Y'], got ['T', 'Z'].
This is interesting because the dimensions of the dataset I'm using a T, Z, Y, X, where the Z dimension is a singleton. Here's more of the errors text from that:
Screenshot 2023-08-02 at 3 53 19 PM

So, based on that warning, I made a new branch where I added z_t to the NCAR field list using the standard_name and units exactly from standard CESM output:
"z_t": {"axis": "Z", "standard_name": "depth from surface to midpoint of layer", "positive": "down", "units": "centimeters"},

This produced new errors where it appears that the issue is that the dimension z_t needs units including in the settings.jsonc:
Screenshot 2023-08-02 at 3 59 46 PM

So, I tried adding the appropriate units to z_t in my settings.jsonc:
"TLONG": {"standard_name": "longitude"}, "TLAT": {"standard_name": "latitude"}, "z_t": {"standard_name": "depth", "units": "centimeters"}

Which produces a new error with "Axis OTHER not defined in convention 'NCAR'.".":

Screenshot 2023-08-02 at 4 02 02 PM

I don't know what to make of Axis OTHER. I've tried a bunch of other little modifications like swapping the dimensions nlon and nlat for TLONG and TLAT in both my settings.jsonc and the fieldlist_NCAR, but neither helps resolve the errors, so I'm now stuck.

One possibility to resolve this issue would be to manually preprocess the CESM output to get rid of the singleton z_t dimension. However, all CESM default SST output has this 1-length z_t dimension, so this work-around would be needed every time, which isn't ideal. I'm also not convinced that z_t is the real issue here, though honestly, I don't really know what the issue is here.

Any ideas that you or @jkrasting have would be greatly appreciated. I've pushed my latest modifications to a new branch on my fork called dummypod1. Thanks again for your help!

@emaroon Fortunately, I've been deep-diving into the coordinate configuration trying to figure out how to modify it for the next MDTF-diagnostics iteration, and have an idea where this is coming from. I have your remote branch ready to test. Please send a copy of the dummy dataset, or an ftp/globus link I can download it from, and I will start debugging.

@wrongkindofdoctor Fantastic, thanks! I've emailed you the example file that I'm working with.

@emaroon I have added 'z_t' to the NCAR fieldlist in the main branch, so you can update your main and POD development branches in your local and remote forks. You need to modify your POD settings file dimensions so that the standard names for TLAT and TLON match the corresponding entries in the NCAR fieldlist table. Otherwise, the framework will try to match them to the "lat" and "lon" entries in the fieldlist and fail. You also need to define axis attributes for TLAT and TLON so that the framework does not insert default names of "lat" and "lon" during the translation:

 "TLONG": {
             "standard_name": "array of t-grid longitudes",
             "axis": "X"
           },
  "TLAT": {
             "standard_name": "array of t-grid latitudes",
             "axis": "Y"
          },

Note that I am doing away with default dimension names and attributes in the framework update to accommodate multiple possibilities for horizontal coordinates, so the natl_ocean settings file will be ahead of the game.

Hallelujah! With those modifications and the information the framework needs axes to match between the fieldlists and settings.jsonc (somehow I missed that...), I was able to read in CESM output, do a simple operation, and write out a plot and a netcdf. I should be able to move forward now and pull in the 4 variables needed for this POD and actually make a POD. HOORAY!!!

Thank you!!!