JCSDA-internal/eva

Modify mon_obs_space.py to make more generic

Closed this issue ยท 8 comments

Description

data/mon_obs_space.py was recently added to make legacy RadMon time-series (binary) data files available as a data source for the EVA package. The code from this initial effort explicitly names the data file components in this way:

timestep_ds = Dataset(
                    {
                        "count": (("channels", "regions"), count_tmp),
                        "penalty": (("channels", "regions"), penalty_tmp),
                        "omgnbc": (("channels", "regions"), omgnbc_sum_tmp),
                        "total": (("channels", "regions"), total_sum_tmp),
                        "omgbc": (("channels", "regions"), omgbc_sum_tmp),
                        "omgnbc2": (("channels", "regions"), omgnbc_sum2_tmp),
                        "total2": (("channels", "regions"), total_sum2_tmp),
                        "omgbc2": (("channels", "regions"), omgbc_sum2_tmp),
                        "cycle": (("channels", "regions"), cycle_tmp),
                    },

That works, but it is specifically tailored to the RadMon time series data. There are additional types of legacy DA binary files which will need to be supported as part of the Mon 2.0 effort with different variables and dimension names.

Accordingly make data/mon_obs_space.py more generic. Start with get_ctl_stats(), which parses and returns information from a control file. Instead of returning explicitly named variables, return dicts for the dimensions, variables, and attributes. Then use that information to load the file contents in a generalized fashion. Once this is working for the RadMon time series data it should be simple to add additional monitor and data file types, starting with the RadMon scan-angle data files.

Please assign this to @EdwardSafford-NOAA

Requirements

Load data specified in the corresponding yaml file. There should be no difference in plotted results from running eva testMonDataSpaceHirs4Metop-A.yaml with the new (generic) version of data/mon_obs_space.py and the results from running the same with the current develop version.

Additionally new test data for RadMon scan angle data files will be included in the release along with a test yaml file to generate image plots. These should run to completion.

Dependencies

None

I've run into some complications with the RadMon angle binary files. They are 3-D not 2-D like the time binary files, and they have some eccentricities. It's a solvable problem but it adds enough complexity that doing this in 2 pieces will make code reviews easier.

What I've got now accomplishes reading RadMon time series binary data files and process the prototype time series and summary plots. So I'm going to wrap this up and issue a PR then tackle the 3-D work in a new issue.

Progress update. This is an angle plot for hirs4_n19, channel 1, obs count over 4 cycles. Angle data files are (mostly) 3-D not 2-D, so this is a bit more complex than the summary plots. Note that the RadMon angle plots don't plot 0 values and I didn't do that yet. This is a significant step nonetheless.

hirs4_n19 0 count

Same plot as above with an accept_where transform applied to remove zero values from the Dataset. This includes both a single cycle and 1 day (4 cycle) average plot of obs counts by scan angle.

hirs4_n19 0 count(3)

In doing some stress tests with large sets of data I've identified a problem with data_mon_space.py. Missing data files in a large range of data are fairly common and are not a fatal error. I need to add a mechanism to load 0/missing data values and a cycle time for missing files.

Here's a time series plot comparing ozn obs counts for the same instrument in April of 2022 and 2023. I found I had to make some modifications to mon_data_space.py to zero out missing data files in order to properly display the time series. That change will be included in the next PR for this issue.

time ompsnp_npp

@CoryMartin-NOAA @ADCollard I had a quick look into the x axis question, and I'm not sure it should have worked. I checked the yaml file and the x axis is the 2022 cycle times for both datasets. I'm not sure how/why that works for the 2023 data. I'll take a deeper dive into that.

Example MinMon gnorm plot. This required adding a log function to the arithmetic.py transform, which was surprisingly straightforward; a testament to good design.

gnorm 2023061400

With PR #108 merged I'm going to close this issue.