xarray-contrib/xwrf

[FEATURE]: Essential diagnostic variables

Closed this issue · 3 comments

Description

A key feature that was formally part of #14, but was stripped out while we were still working out #20, was calculating the key diagnostic variables needed to use WRF output in most analysis workflows:

  • 'T' going to potential temperature with a magic number offset of 300 K
  • 'P' and 'PB' combine to form pressure
  • 'PH' and 'PHB' combine to form geopotential
  • Geopotential (see previous) to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere

Implementation

This was implemented previously in #14 as

def calc_base_diagnostics(dataset, drop=True):
    """Calculate the four basic fields that WRF does not have in physically meaningful form.
    Parameters
    ----------
    dataset : xarray.Dataset
        Dataset representing WRF data opened via normal backend, with chunking.
    drop : bool
        Decide whether to drop the components of origin after creating the diagnostic fields from
        them.
    Notes
    -----
    This operation should be called before destaggering.
    """
    # Potential temperature
    dataset['air_potential_temperature'] = dataset['T'] + 300
    dataset['air_potential_temperature'].attrs = {
        'units': 'K',
        'standard_name': 'air_potential_temperature'
    }
    if drop:
        del dataset['T']

    # Pressure
    dataset['air_pressure'] = dataset['P'] + dataset['PB']
    dataset['air_pressure'].attrs = {
        'units': dataset['P'].attrs.get('units', 'Pa'),
        'standard_name': 'air_pressure'
    }
    if drop:
        del dataset['P'], dataset['PB']

    # Geopotential and geopotential height
    dataset['geopotential'] = dataset['PH'] + dataset['PHB']
    dataset['geopotential'].attrs = {
        'units': 'm**2 s**-2',
        'standard_name': 'geopotential'
    }
    dataset['geopotential_height'] = dataset['geopotential'] / 9.81
    dataset['geopotential_height'].attrs = {
        'units': 'm',
        'standard_name': 'geopotential_height'
    }
    if drop:
        del dataset['PH'], dataset['PHB']

    return dataset

Tests

These are pretty straightforward calculations, so creating tests using our existing "raw wrfout" (i.e., not the "dummy" or "geo_em" files) should also be straightforward.

Questions

  • How should we consider lazy-loading? If we do the straight implementation like included above, it should "just work" with Dask for delayed loading, but will otherwise eagerly evaluate...is this okay?

Will this calculation be part of the xwrf.postprocess() or will it be a standalone function to be invoked by users explicitly?

  • How should we consider lazy-loading? If we do the straight implementation like included above, it should "just work" with Dask for delayed loading, but will otherwise eagerly evaluate...is this okay?

I think that would be expected behavior for something like this. I like the idea of the diagnostic variables "being ready for access" (i.e., already set with a Python ds['variable'] = ... statement), and initializing the dataset with Dask makes it lazy-loading by default. I think that's the right approach.

Will this calculation be part of the xwrf.postprocess() or will it be a standalone function to be invoked by users explicitly?

Yes, as I'd view it as needed for most workflows (and if not taken, users could be mislead by what is there). Still, it'd be easy to make it conditional like decode_times.

I think that would be expected behavior for something like this. I like the idea of the diagnostic variables "being ready for access" (i.e., already set with a Python ds['variable'] = ... statement), and initializing the dataset with Dask makes it lazy-loading by default. I think that's the right approach.

Sounds good, that's enough confirmation for me to make a PR based on this approach!