E3SM-Project/e3sm_diags

CDAT Migration Phase 2: Refactor `lat_lon` set

Closed this issue · 1 comments

/## Overview
The components to refactor include the driver, the plotter, and the viewer. Each file has a set of functions to refactor and module references that might need to be refactored as well.

We will use the "bubble context" refactoring method for this task.

File 1 - lat_lon_driver.py

Order of function calls: run_diag() -> create_metrics() -> create_and_save_data_and_metrics()

  • Refactor utils.dataset.Dataset class methods to operate on xr.Dataset/xr.DataArray objects
    a. Create skeleton methods for methods that need to be factored (append _new to name)
    b. Add failing unit tests for these methods
    c. Implement new methods
    d. Update references from old methods to new methods
  • Refactor general utilities that are called by the driver to operate on xarray objects (e.g., climo.py)
    a. Create a utils_new.py
    b. Add skeleton function definitions for new general utilities
    c. Add failing unit tests for these functions
    d. Implement new general utilities
    e. Update references from old utilities to new utilities
  • Refactor create_and_save_data_and_metrics() and create_metrics()
    a. These functions call e3sm_diags.metrics functions

Module References - tree diagram

These are the other modules referenced in lat_lon_driver.py that need to be refactored to operate on xr.DataArray/xr.Dataset objects.

  • e3sm_diags.utils
    • dataset_new.py (class Dataset) - Refactor methods that call cdms2 (e.g., cdms2.open() )
      • get_static_variable()
      • get_attr_from_climo()
      • get_climatology_variable()
        • _get_climo_var()
        • get_timeseries_var()
        • _get_var_from_timeseries_file()
        • e3sm_diags/driver/utils/climo.py -> create climo_xr.py
    • io.py (replaces I/O functions in general.py)
      • get_output_dir()
      • save_ncfiles()
      • get_name_and_yrs() -- added as a method to Dataset class
    • regrid.py (replaces regridding functions in general.py)
      • (NEW) has_z_axis (replaces cdms2.axis.getLevel())
      • regrid_to_lower_res() - 100% identical results
      • convert_to_pressure_levels() -- Really close (1e-6 max abs and 1e-8 max rel diffs)
      • select_region() -- Really close (1e-7 max abs and rel diffs without land/sea region masking lower limit)
        • Opened an xcdat GitHub discussion post to figure out why xesmf and cdms2 esmf regridder produces different regridded land sea masks -- caused by cdms2 not importing esmf properly, has since been patched
        • Replaced by _apply_land_sea_mask() and _subset_on_region()
  • e3sm_diags.metrics_xr.py (replaces metrics/__init__.py)
    • corr -- virtually identical (1e-14 max abs and rel diffs)
    • mean - virtually identical (1e-12 max abs and 1e-14 max rel diffs)
    • rmse - virtually identical (1e-13 max abs and 1e-14 max rel diffs)
    • std - virtually identical (1e-14 max abs and 1e-15 max rel diffs)

File 2 - lat_lon_plot.py

Functions to Refactor

  • e3sm_diags.plot.plot()
    • _get_plot_fnc()
      • lat_lon_plot.plot() -- moved sub-functions below to plot/utils.py
        • plot_panel()
        • determine_tick_step()
        • get_ax_size()
        • add_cyclic()

Module References

  • e3sm_diags.derivations.default_regions - region_specs
  • e3sm_diags.driver.utils.general - get_output_dir()
  • e3sm_diags.plot - get_color_map()

File 3 - default_viewer.py (backlogged for a later time, once all sets have been refactored).

Functions to Refactor

  • create_viewer() -> seasons_used(), _get_description(), _add_to_lat_lon_metrics_table(), create_metadata(), and _add_information_to_viewer()

Module References

  • e3sm_diags.parser - SET_TO_PARSER
  • e3sm_diags.viewer.utils - add_header(), h1_to_h3()
  • e3sm_diags.viewer.lat_lon_viewer - generate_lat_lon_metrics_table(), generate_lat_lon_taylor_diag(), generate_lat_lon_cmip6_comparison(), generate_lat_lon_metrics_table()

Notes from 4/11/23 Meeting:

Performing validation on the ocean fraction climatology calculations between old and new climo function. Next steps:

  1. Try to figure out why climatology outputs have bigger than expected diffs (floating point error? use of np.sum vs. np.einsum?)
  2. Try to get results as close as possible
  3. Check if the slice_flag implementation is needed to add an extra coordinate point in the new Dataset method for subsetting time series variables (does it help improve results significantly?)