JCSDA-internal/eva

Add support for binary station data files

Closed this issue · 8 comments

Description

Extend src/eva/data/mon_data_space.py to support binary files in GrADS station format.

Support for legacy Radiance DA monitor files was recently added in src/eva/data/mon_data_space.py. These are simple binary files. The exact file format (variable names and dimensions) is described by the accompanying control (*.ctl) file.

The legacy Ozone DA monitor (OznMon) data files are in GrADS station format. This is a binary file with a header on each obs record. The header includes the station data (station id, lat, lon, etc). The accompanying control file only indicates the variable information and includes the indicator dtype station. The station header does not vary so that format is not specified in the control file.

Requirements

Given a yaml file identifying an OznMon control file and 4 data files, load the data and generate an image with 4 map-scatter plots each containing the data from one specific cycle.

Acceptance Criteria (Definition of Done)

A test yaml file correctly loads 4 OznMon data files and generates the 4 pane map-scatter of the data. The specific variable doesn't matter but I'll use o-f for consistency with the existing OznMon horizontal plots.

Dependencies

None

Any of those able to do so -- please assign me to this issue. Thx.

After some additional thought I'm not sure this is a high priority. Maybe we can talk abouit this at the next Mon 2.0 meeting. I'd like to keep this issue open, but don't expect to work on it in the near term. It can still be assigned to me though.

Update -- I need to take this on now. I'm ready to add Ozn horizontal plots, and it's likely that some horizontal plots now associated with the ConMon will be desirable as well.

I've figured out the structure of the OznMon horiz files. It's a 28 byte header (station id, lat, lon, time index, nlev, surf flag) followed by 88 floats for the 4 variables and 22 pressure levels. That is repeated n times and the last header has nlev=0 which signals EOF. Next step is to map that data to the structure that emcpy needs to make a horizontal plot.

One complication is that the OznMon and ConMon control files are significantly different and mon_data_space.py will have to handle both. Neither actually specifies the number of levels directly. The OznMon uses this specification:

*XDEF is pressure level number
*  x=    1, level=      0.101 , iuse=  1 , error=    0.020
*  x=    2, level=      0.160 , iuse=  1 , error=    0.020
*  x=    3, level=      0.254 , iuse=  1 , error=    0.025
*  x=    4, level=      0.403 , iuse=  1 , error=    0.080
*  x=    5, level=      0.639 , iuse=  1 , error=    0.150
*  x=    6, level=      1.013 , iuse=  1 , error=    0.056
*  x=    7, level=      1.601 , iuse=  1 , error=    0.125
*  x=    8, level=      2.543 , iuse=  1 , error=    0.200
*  x=    9, level=      4.033 , iuse=  1 , error=    0.299
*  x=   10, level=      6.394 , iuse=  1 , error=    0.587
*  x=   11, level=     10.132 , iuse=  1 , error=    0.864
*  x=   12, level=     16.009 , iuse=  1 , error=    1.547
*  x=   13, level=     25.433 , iuse=  1 , error=    2.718
*  x=   14, level=     40.327 , iuse=  1 , error=    3.893
*  x=   15, level=     63.936 , iuse=  1 , error=    4.353
*  x=   16, level=    101.325 , iuse=  1 , error=    3.971
*  x=   17, level=    160.094 , iuse=  1 , error=    4.407
*  x=   18, level=    254.326 , iuse=  1 , error=    4.428
*  x=   19, level=    403.273 , iuse=  1 , error=    3.312
*  x=   20, level=    639.361 , iuse=  1 , error=    2.198
*  x=   21, level=   1013.250 , iuse=  1 , error=    2.285
*  x=   22, level=      0.000 , iuse= -1 , error=    7.236

While the ConMon uses this specification:

* ZDEF mandatary level 1000,925,850,700,500,400,300,250,200,150,100,70,50

mon_data_space.py will need to load the level values in both cases; plotting needs to be done by level and the level value must be specified in the plots.

As part of this work I'm going to have to update plotting/batch/emcpy/diagnostics/map_scatter.py to enable processing by level. I thought about making that change separately from this issue, but the modifications in this issue are required to test the necessary changes to map_scatter.py.

It's been a slog but it's working now.
image

The major challenge has been that the number of obs varies with every cycle. Merging those different dimensions into a dataset provided a great opportunity for learning. Further complicating things, the number of obs isn't known until the data file is read. With the binary ieee_d files the dimensions can be learned simply by reading the control file. With station data the control file is less helpful.

Closed by #160 .