Adding capability to compare geovals
asewnath opened this issue · 3 comments
We want to add the capability for comparing geovals from different systems (jedi, gsi, geos, etc.) This involves adding a new dataset reader and potentially a transform. The reader would require an obs file along with the geoval file to retrieve lat/lon information. The reader would also take in templated filenames so that it may read more than one instrument file at a time.
The new transform takes the lat/lon information from experiment
and control
, finds a list of indices from control
that are the closest match to experiment
, and then updates the experiment
dataset with variables from the control
dataset that are index matched to it. The new fields in the experiment
dataset would look something like this: experiment_geovals::amsua_n19_from_control_geovals::vegetation_area_fraction
Potential eva configs for geoval space:
datasets:
- name: experiment_geovals
type: GeovalSpace
obs_file:
- ${data_experiment_path}/{instrument}_experiment.nc4
geovals_file:
- ${data_experiment_path}/{instrument}_experiment_geovals.nc4
levels: &levels 33
instruments:
- name: amsua_n19
geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
- name: avhrr3_metop-b
- name: control_geovals
type: GeovalSpace
obs_file:
- ${data_control_path}/{instrument}_control.nc4
geovals_file:
- ${data_control_path}/{instrument}_control_geovals.nc4
levels: &levels 33
instruments:
- name: amsua_n19
geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
- name: avhrr3_metop-b
transforms:
- transform: index_match
starting_dataset: control_geovals
match_index_to_this_dataset: experiment_geovals
@CoryMartin-NOAA Please let me know if you have any thoughts or suggestions for this new reader/transform. I had also thought to combine control
and experiment
into one dataset read and perform the index matching there so that there's no need for a new transform
@asewnath I think the transform is a necessary thing. I know @weihuang-jedi was looking for something like this.
Beyond geovals, I think the new transform could be useful for two IODA obs spaces. Say you have two experiments of PE counts, so the distributions may be different, but its the same data, so we could re-index to plot. This would also be good for independent GSI vs JEDI h(x) comparisons.
For the new dataset reader, can we make it more generic than geovals? Like something like 'data file' and 'coordinate file' or something like that? This is analogous to how the FV3 RESTART files have data in one file, but the lat/lon info is in another.
Thanks for the guidance @CoryMartin-NOAA. Given what you have suggested, I've modified the following proposed config file for an example of reading two sources of geoval files and the new transformer
datasets:
- name: experiment_geovals
type: DataFile
data_file:
- ${data_experiment_path}/{instrument}_experiment_geovals.nc4
levels: &levels 33
instruments:
- name: amsua_n19
geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
- name: avhrr3_metop-b
- name: control_geovals
type: DataFile
data_file:
- ${data_control_path}/{instrument}_control_geovals.nc4
levels: &levels 33
instruments:
- name: amsua_n19
geoval_variables: &geoval_variables ['vegetation_area_fraction', 'leaf_area_index']
- name: avhrr3_metop-b
- name: experiment_lat_lon
group: state
type: LatLon
filename: ${data_input_path}/{instrument}_experiment.nc4
variables: [lat, lon]
- name: control_lat_lon
group: state
type: LatLon
filename: ${data_input_path}/{instrument}_control.nc4
variables: [lat, lon]
transforms:
- transform: index_match
dataset_1: control_geovals
lat_lon_1: control_lat_lon
dataset_2: experiment_geovals
lat_lon_2: experiment_lat_lon
I'll iterate on what makes the most sense for the transform config. Also, for the transform, lat_lon_1, lat_lon_2 would be optional arguments (case where IodaObsSpace datasets are used, etc)
Looks good, thanks @asewnath