CDAT Migration Phase 2: Refactor `lat_lon` set
Closed this issue · 1 comments
tomvothecoder commented
/## Overview
The components to refactor include the driver, the plotter, and the viewer. Each file has a set of functions to refactor and module references that might need to be refactored as well.
We will use the "bubble context" refactoring method for this task.
File 1 - lat_lon_driver.py
Order of function calls: run_diag()
-> create_metrics()
-> create_and_save_data_and_metrics()
- Refactor
utils.dataset.Dataset
class methods to operate on xr.Dataset/xr.DataArray objects
a. Create skeleton methods for methods that need to be factored (append_new
to name)
b. Add failing unit tests for these methods
c. Implement new methods
d. Update references from old methods to new methods - Refactor general utilities that are called by the driver to operate on xarray objects (e.g.,
climo.py
)
a. Create autils_new.py
b. Add skeleton function definitions for new general utilities
c. Add failing unit tests for these functions
d. Implement new general utilities
e. Update references from old utilities to new utilities - Refactor
create_and_save_data_and_metrics()
andcreate_metrics()
a. These functions calle3sm_diags.metrics
functions
Module References - tree diagram
These are the other modules referenced in lat_lon_driver.py
that need to be refactored to operate on xr.DataArray/xr.Dataset objects.
e3sm_diags.utils
-
dataset_new.py
(class Dataset
) - Refactor methods that call cdms2 (e.g.,cdms2.open()
)-
get_static_variable()
-
get_attr_from_climo()
-
get_climatology_variable()
-
_get_climo_var()
-
get_timeseries_var()
-
_get_var_from_timeseries_file()
-
e3sm_diags/driver/utils/climo.py
-> createclimo_xr.py
-
-
-
io.py
(replaces I/O functions ingeneral.py
)-
get_output_dir()
-
save_ncfiles()
-
get_name_and_yrs()
-- added as a method toDataset
class
-
-
regrid.py
(replaces regridding functions ingeneral.py
)- (NEW)
has_z_axis
(replacescdms2.axis.getLevel()
) -
regrid_to_lower_res()
- 100% identical results -
convert_to_pressure_levels()
-- Really close (1e-6 max abs and 1e-8 max rel diffs) -
select_region()
-- Really close (1e-7 max abs and rel diffs without land/sea region masking lower limit)Opened an xcdat GitHub discussion post to figure out why xesmf and cdms2 esmf regridder produces different regridded land sea masks-- caused by cdms2 not importing esmf properly, has since been patched- Replaced by
_apply_land_sea_mask()
and_subset_on_region()
- (NEW)
-
e3sm_diags.metrics_xr.py
(replacesmetrics/__init__.py
)-
corr
-- virtually identical (1e-14 max abs and rel diffs) -
mean
- virtually identical (1e-12 max abs and 1e-14 max rel diffs) -
rmse
- virtually identical (1e-13 max abs and 1e-14 max rel diffs) -
std
- virtually identical (1e-14 max abs and 1e-15 max rel diffs)
-
File 2 - lat_lon_plot.py
Functions to Refactor
e3sm_diags.plot.plot()
-
_get_plot_fnc()
-
lat_lon_plot.plot()
-- moved sub-functions below toplot/utils.py
-
plot_panel()
-
determine_tick_step()
-
get_ax_size()
-
add_cyclic()
-
-
-
Module References
e3sm_diags.derivations.default_regions
-region_specs
e3sm_diags.driver.utils.general
-get_output_dir()
e3sm_diags.plot
-get_color_map()
File 3 - default_viewer.py
(backlogged for a later time, once all sets have been refactored).
Functions to Refactor
create_viewer()
->seasons_used()
,_get_description()
,_add_to_lat_lon_metrics_table()
,create_metadata()
, and_add_information_to_viewer()
Module References
e3sm_diags.parser
-SET_TO_PARSER
e3sm_diags.viewer.utils
-add_header()
,h1_to_h3()
e3sm_diags.viewer.lat_lon_viewer
-generate_lat_lon_metrics_table()
,generate_lat_lon_taylor_diag()
,generate_lat_lon_cmip6_comparison()
,generate_lat_lon_metrics_table()
tomvothecoder commented
Notes from 4/11/23 Meeting:
Performing validation on the ocean fraction climatology calculations between old and new climo function. Next steps:
- Try to figure out why climatology outputs have bigger than expected diffs (floating point error? use of np.sum vs. np.einsum?)
- Try to get results as close as possible
- Check if the slice_flag implementation is needed to add an extra coordinate point in the new Dataset method for subsetting time series variables (does it help improve results significantly?)