Create new Pair-Stat tool to compute statistics for already paired forecast and observation data

Question

Create new Pair-Stat tool to compute statistics for already paired forecast and observation data

Opened this issue a month ago · 7 comments

Answer 1 · 2024-11-04T19:25:57.000Z

Funding source added and deadline added.

Answer 2 · 2024-11-04T19:26:33.000Z

Work in #3007 will support IODA files with Pair-Stat.

Answer 3 · 2024-11-20T18:18:36.000Z

@willmayfield I'm wondering about the use of a grid within the Pair-Stat tool.

One of the first things done in the other MET statistics tools (e.g. Point-Stat, Grid-Stat, Series-Analysis, MODE, ...) is deciding on a common grid to be used for the verification. That can be defined as the "forecast" grid, "observation" grid, or some other grid, defined by it's name, grid specification string, or the path to a gridded data file. All gridded data is regridded to the common vx grid prior to be used and that includes:

gridded forecast data
gridded observation data, when applicable
gridded climo data
land/sea mask data
topography data
gridded masking regions created by Gen-Vx-Mask

Since Pair-Stat won't use gridded forecast/observation data, defining a verification grid is NOT REQUIRED. Instead, when extracting data from climo, land/sea mask, topography, gridded masks we could just use whatever grid that data happens to be defined on and interpolate to the (lat, lon) location of the pair.

The advantage is that avoiding those regridding steps will be a little faster and will introduce less "interpolation error".
The disadvantage is that it'll be less consistent with the logic of the other MET statistics tools.

Shall I proceed WITHOUT defining a common "verification grid"?
Or should I use one to maintain more consistency with the logic of other tools?

Answer 4 · 2024-11-22T17:29:12.000Z

As discussed on Nov 22, 2024 with @DanielAdriaansen and @willmayfield, recommend NOT using a common verification grid since no doing so seems to be the simpler approach. If adding back in this functionality is requested in the future, it can be added at that time.

Answer 5 · 2024-12-04T18:07:35.000Z

As discussed on Dec 4, 2024 with @georgemccabe, for setting up config options to filter input paired data, recommend:

Reusing the existing mpr_column and mpr_thresh config options from Point-Stat and Grid-Stat to filter numeric columns (or differences or abs value of differences) from MPR data.
Adding new mpr_str_inc and mpr_str_exc config options to filter input paired data by string matching inclusion and exclusion. These are arrays of dictionaries with name and value entries:

mpr_str_inc = [ { name = "DESC"; value = "NA"; } ];
mpr_str_exc = [ { name = "VX_MASK"; value = "CONUS"; } ];

Note that this introduces some inconsistency since mpr_str_inc/exc are arrays of dictionaries while mpr_column/thresh are arrays of strings and thresholds. However we agree that this is a preferable design and users will set these via METplus Wrappers anyway.

Answer 6 · 2024-12-06T20:34:55.000Z

As discussed on Dec 6, 2024 (see meeting notes), add a new group_name config option to specify the group name from which the variable name should be extracted.

Answer 7 · 2024-12-09T21:43:30.000Z

@JohnHalleyGotway After our discussion on Friday, I dug into some of the files in https://github.com/JCSDA-internal/ufo-data/tree/develop/testinput_tier_1.

An instructive file might be amsua_n19_hofxnm_2018041500_m_rttovcpp.nc4.

This file has one variable, brightness_temperature, with observation group ObsValue, possible "forecast" groups HofX and MPASJEDIHofX, dimension "Location" (size 100), as well as "Channel" (size 15) which may be desired to specify for the verification task. Channel takes values in the MetaData group along with coordinates of height, latitude, longitude, and datetime.

There are several other MetaData available such as sensorZenithAngle(Location), sensorPolarizationDirection(Channel), etc. which I am not sure if they would be desirable to be used in, for example, a filter job. That may need to be left to the user to perform independently.

For a very simple file with a more traditional variable, you could look at sondes_q_obs_2020121500_singular.nc4.

This file has the variable specificHumidity, with groups ObsValue, hofx, GsiHofx, etc, and within MetaData there are variables datetime, latitude, longitude, and possible vertical coordinates height, pressure, and stationElevation. There are also, for example, MetaData information in stationIdentification which again might be useful in a filter job, but I'm not sure if that's within our immediate scope of capabilities.

Please let me know if you have any questions or would like to discuss (I'll find a meeting time in the next few days either way).

Create new Pair-Stat tool to compute statistics for already paired forecast and observation data

Describe the New Feature

Acceptance Testing

Time Estimate

Sub-Issues

Relevant Deadlines

Funding Source

Define the Metadata

Assignee

Labels

Milestone and Projects

Define Related Issue(s)

New Feature Checklist