Unidata/MetPy

Change xarray coordinate identification to enforce dimensionality and allow both lat/lon and x/y identification

Closed this issue · 13 comments

As mentioned in #1089, it would be helpful to systematically identify both lat/lon and x/y coordinates on a DataArray/Dataset. This could be useful both as a sanity check (as in #1089) and in kinematics calculations (as in #893). There have also been issues with the current implementation of da.metpy.x and da.metpy.y when latitude_longitude grid mapping is assumed but the data grid is not latitude_longitude (such as #1004).

I would propose that in place of the current behavior of da.metpy.x and da.metpy.y, we have the following:

  • da.metpy.x and da.metpy.y bind only to the orthogonal x and y coordinates of the DataArray's data grid.
    • Should be 1D
    • While most of the time should also be dimension coordinates, I don't think we should restrict them to only dimension coordinates, since there are use cases where having them be auxiliary coordinates would be useful (such as cross sections, where both the projection x and y coordinates are in one-to-one relation with the index dimension)
    • If such coordinates do not exist, raise an AttributeError (which may suggest adding them to the DataArray/Dataset once sufficient helpers are implemented)
  • da.metpy.longitude and da.metpy.latitude bind to geographic longitude and latitude coordinates (according to CF conventions and heuristics as already implemented)
    • Can be any dimensionality (e.g., 1D for plate carrée, 2D for grid with most other projections, 3D for moving nests)
    • Will be identical to da.metpy.x and da.metpy.y, respectively, for plate carrée
    • Question: if not already present, should they be computed and added on-demand from da.metpy.x and da.metpy.y using da.metpy.cartopy_crs when accessed? Or should this be left to a separate helper function?

For "nice" datasets, where the data projection, dimensions, and coordinates correspond as expected, these changes should not affect the behavior of da.metpy.x and da.metpy.y, and would simply add the longitude and latitude attributes. However, for "mystery projection" datasets where the latitude_longitude grid mapping is assumed (ref #1039/#1065) even though the data grid is not latitude_longitude, this would be an API break (in that case, users would have to change instances of da.metpy.x and da.metpy.y to da.metpy.longitude and da.metpy.latitude, respectively).

Before digging into implementing this, what are everyone's thoughts on it?

Also, as I was reading over related discussion in geoxarray (ref geoxarray/geoxarray#10), I thought it best to mention that I don't believe this change would change the behavior of MetPy's wrapped .sel and .loc...since that selection requires dimension coordinates, and the API-breaking case of 2D lon/lat being identified as x and y would already not work (the exception raised would just change from one from xarray to the AttributeError from MetPy).

I think it would be greatly beneficial to always have both the x, y and longitude, latitude references. There are some operations where it makes it very handy to have the lat/lon (e.g., plotting scalars) and better to have x/y (e.g., plotting vectors). I don't have a good answer for "mystery projection" datasets, that might be where we just have to say the user has to know something about their dataset!?!

(Based on conversations) I think the way forward for this will be to implement this as described, but leave the current behavior of 2D lon/lat as x/y behavior in as warned deprecated behavior until 1.0.

Saw @jthielen talk at AMS yesterday- didn't know this was a part of MetPy! My question is whether or not this will compute "2D" lat/lon. It seems like if I use assign_crs(), and then assign_latitude_longitude(), it uses the 1D x/y and the CRS to compute 1D lat/lon. Is it as simple as reshaping the 1D lat/lon to fit the nx/ny dimensions of the data if I wish to assign 2D coordinates to the DataArray? Do you have a good reference for a "CF projection attributes" example to pass to assign_crs()? Thanks!

@DanielAdriaansen The assign_latitude_longitude() helper calculates new 2D latitude/longitude auxiliary coordinates from existing 1D y/x dimension coordinates. In retrospect, I should probably go back and clarify that in the documentation.

For the CF projection attributes, the current best reference is the CF conventions document itself. Here is the section listing the grid mappings and their associated parameters. A summary of the most commonly used ones might be useful to add to MetPy's documentation too.

@jthielen Thanks! I've played around with this a bit, and have been unsuccessful getting assign_crs() to work. I did some testing, and I have some confusion on line 889 of xarray.py. If I insert this statement prior to line 889 inside xarray.py:

print(xarray_object.assign_coords(crs=CFProjection(attrs)))

MetPy prints my dataset object and correctly shows a "crs" coordinate. However, in my script where I am calling assign_crs(), if I print the dataset object before and after calling assign_crs(), there is no "crs" coordinate after calling assign_crs(). It seems like the "return" statement is having trouble returning a new xarray object with the results of the call to xr.assign_coords()?

I am calling assign_crs() like this:

# Open netcdf file
ds = xr.open_dataset()
# Create dictionary of CF projection params (not shown here for brevity)
cfparams = {}
# Print the dataset object to view a snippet
print(ds) --> no "crs" coordinate here
# Assign the "crs" coordinate using MetPy
ds.metpy.assign_crs(cf_attributes=cfparams)
# Print the dataset object to confirm "crs" coordinate added
print(ds) --> still no "crs" coordinate but expecting one

Some environment details:

Python 3.8.1 | packaged by conda-forge | (default, Jan  5 2020, 20:58:18)
# Name                    Version                   Build  Channel
xarray                    0.14.1                     py_1    conda-forge
# Name                    Version                   Build  Channel
metpy                     1.0.0rc1                   py_0    conda-forge/label/metpy_rc

Any suggestions?

@DanielAdriaansen Note that assign_crs returns a new dataset, and does not modify the existing dataset in-place, which matches the behavior of assign_coords in xarray. So, your line

ds.metpy.assign_crs(cf_attributes=cfparams)

should be modified to

ds = ds.metpy.assign_crs(cf_attributes=cfparams)

Hopefully this takes care of it (and if it doesn't, then this is indeed a bug that will require further investigation).

@jthielen Thank you! I was looking here and also at the assign_coords documentation but the code snippets they show are in interactive mode and I rarely use it so was tricked into believing it was modified in place (despite the documentation telling me otherwise...).

I have the "crs" coordinate now. I think I may have found a bug here: https://unidata.github.io/MetPy/latest/_modules/metpy/xarray.html#MetPyDatasetAccessor.assign_latitude_longitude

I think the line:
return self.assign_coords(latitude=latitude, longitude=longitude)

Should be:
return self._dataset.assign_coords(latitude=latitude, longitude=longitude)

The error I was receiving was:
AttributeError: 'MetPyDatasetAccessor' object has no attribute 'assign_coords'

Let me know the best way to handle this. I've never been a contributor or made a pull request anywhere, and I'm not even sure that's the correct path for this so if it is in fact a bug suggest the best action for me, if any. Thanks again for all your help!

I think I may have found a bug here: https://unidata.github.io/MetPy/latest/_modules/metpy/xarray.html#MetPyDatasetAccessor.assign_latitude_longitude

I think the line:
return self.assign_coords(latitude=latitude, longitude=longitude)

Should be:
return self._dataset.assign_coords(latitude=latitude, longitude=longitude)

The error I was receiving was:
AttributeError: 'MetPyDatasetAccessor' object has no attribute 'assign_coords'

Let me know the best way to handle this. I've never been a contributor or made a pull request anywhere, and I'm not even sure that's the correct path for this so if it is in fact a bug suggest the best action for me, if any. Thanks again for all your help!

@DanielAdriaansen Good catch on this, and yes, that looks like exactly the right fix for that bug! This case should have been covered by tests in #1303, but alas, I missed it. If you'd want to give a pull request a try, take a look at the contribution guide here for general steps. I'd be glad to try helping, and I'm sure @dopplershift would as well. But, if you don't feel up to putting together a PR at this point, that's totally fine too...just let me know and I can put together a quick PR instead based on what you've found here.

@jthielen If it's OK to be slow and probably stumble through it, I'll take a crack at it after reviewing the guide you linked. If not, let me know.

One other question I had was with regard to the generation of the actual latitude/longitude values- in particular with respect to the definition of the spheroid/ellipsoid used. I traced this back to here:

def cartopy_globe(self):

If I do not supply an 'earth_radius' kwarg, it looks like cartopy_globe() will set the ellipse to a sphere but not set any of the semimajor_axis, etc... kwargs if I do not provide them. I am unsure what it uses then, as the radius of the Earth. Cartopy documentation notes semimajor_axis is set to None by default, but perhaps it uses the semimajor_axis from WGS84 in this case?

Either way, my question is why I do not see changes in the actual latitude/longitude values when I change the kwargs provided to cartopy_globe(). I ran three tests:

  1. Provide "earth_radius" in the cf proj params
  2. Provide "semimajor_axis" in the cf proj params
  3. Provide neither of the above in the cf proj params

I chose a value of 6371229.0m for tests 1 and 2 (which is different than what WGS84 uses), and received the same latitude/longitude values for all three tests. What this tells me is that tests 1 and 2 are having no effect on the definition of the globe and/or the coordinate transformation occurring, because the values are identical to test 3. The caveat here is that I was only able to understand everything up until this line:

lonlats = ccrs.Geodetic(globe=da.metpy.cartopy_globe).transform_points(
at which point I was unable to follow what was going on under the hood any further.

I added some code to prove to myself that the semimajor_axis attribute of the Globe object was equal to 6371229.0m each time cartopy_globe() was called for tests 1 and 2. For test 3 it looks like it is set to "None" if I do not provide anything explicit. Again, I am unsure what is used when "None" is returned but I have to imagine something is assumed.

Given I proved the globe has the correct semimajor_axis when I provide it via either "semimajor_axis" or "earth_radius", I am leaning towards something weird happening in the coordinate transformation rather than the globe generation.

Stumbling through is how we all operate. 😉

It sounds like you’ve found a real issue with the globe handling. I just moved your comment to a new issue #1308 to make sure we don’t lose track of it.

After a fresh set of eyes this morning, I think issue #1308 is user error and not actually a problem. It all stems from my inability to understand, even after @jthielen told me, that assign_crs() and assign_latitude_longitude() return a NEW dataset object, and do not modify in place. I think I have it now.

After assigning the result of assign_latitude_longitude() to a new variable (e.g. ds = ds.metpy.assign_latitude_longitude(force=True)), I was able to get the 2D latitude/longitude coordinates and the values do change when I change "earth_radius" and "semimajor_axis" as I expect them to now. I will comment over on #1308 with my findings but I think that can be closed.

The reason I was confused before, is my dataset already had 2D lat/lon variables in it, so when I was printing I was getting the original 2D lat/lon values unmodified by MetPy because I hadn't set the result of assign_latitude_longitude() to a new variable. I assumed "force=True" was overwriting the existing 2D lat/lon but after discovering how to actually use assign_latitude_longitude() I see it adds new coordinates and does not use the existing name.

I know the documentation says this, but is there a best practice in Python to let the user know that if they don't assign to a new variable or reassign to the same variable the result of something like assign_latitude_longitude() just disappears into the ether?

I will work on my PR for the bugfix in assign_latitude_longitude for datasets soon. Thanks for allowing me to give it a shot and for all the support during my tire kicking of these new features! It is great to have this capability!

I know the documentation says this, but is there a best practice in Python to let the user know that if they don't assign to a new variable or reassign to the same variable the result of something like assign_latitude_longitude() just disappears into the ether?

I think the documentation is the only way to do this, however, the documentation could definitely emphasize this more. Instead of just a single reference to the "return new" behavior in the Returns section, we could try calling it out in the summary line of the docstring, and a discussion of this behavior (following xarray) could be added to the xarray tutorial.