metno/pyaerocom

use_obs_clim is broken and has always been: Turns all obsdata into nan

Opened this issue · 1 comments

Describe the bug
Please provide a clear and concise description of what the bug is.

  • Pyaerocom version: 0.17.1
  • Computing platform: PPI
  • Configuration file (if applicable):
  • Error message (if applicable):

To Reproduce
My config.py file:

output_dir = "/lustre/storeB/users/oveh/DURF/aeroval/data"
coldata_dir = "/lustre/storeB/users/oveh/DURF/aeroval/coldala"

exp_pi = "Ove Haugvaldstad"
experiment_id="test simulations DURF"
proj_id = "AeroCom"



ALTITUDE_FILTER = {
    'altitude': [0, 1000]
} 


""" Ground based Aeront observations """

OBS_GROUNDBASED = {

    'AeronetSDAV3L2': dict(obs_id='AeronetSDAV3Lev2.daily',
                           # obs_vars=['od550aer', 'ang4487aer'],
                           obs_vars=['od550gt1aer','ang4487aer'],
                           obs_vert_type='Column',
                           obs_filters={**ALTITUDE_FILTER,
                                         **dict(station_name='DRAGON*', negate='station_name')},
                           min_num_obs={'monthly': {'daily': 7}},
                           obs_use_climatology=True,
                           obs_outlier_ranges={'od550aer'    : [0.01, 10],
                                                'od550lt1aer' : [0.01, 10],
                                                'od550gt1aer' : [0.01, 10]},

                           ),
                           
}

MODELS = {
    "NorESM2.1F-LM histSST" : dict(
        model_id="NorESM2-LM-histSST_DURF",
        model_data_dir="/lustre/storeB/project/aerocom/aerocom-users-database/DURF/histSST/NorESM2-LM-histSST_DURF",
        model_use_vars={'od550gt1aer':'od550dust'},
        model_ts_type_read = 'monthly',
    ),

}


CFG = dict(
    # Output directories
    json_basedir=output_dir,
    coldata_basedir=coldata_dir,
    # Run options
    reanalyse_existing=True,  # if True, existing colocated data files will be deleted
    raise_exceptions=True,  # if True, the analysis will stop whenever an error occurs
    clear_existing_json=False,  # if True, deletes previous output before running
    # Map Options
  

    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
    from pyaerocom import const

    print(
        const.CACHEDIR
    )  # Prints where to find the caching folder. Not needed but this folder should be emptied now and then, so I like to see where it is

    stp = EvalSetup(**CFG)  # Makes a setup object from the dict, that PyAeroval can use
    ana = ExperimentProcessor(stp)  # Makes an experiment object
    res = ana.run()  # Runs the experiment  add_model_maps=False,  # Adds a plot of the whole map. Very slow!!!
    only_model_maps=False,  # Adds only plot above, without any other evaluation
    filter_name="ALL-noMOUNTAINS",  # Regional filter for analysis
    map_zoom="World",  # Zoom level. For EMEP, Europe is typically used
    ts_type="monthly",  # Colocation frequency (no statistics in higher resolution can be computed)
    freqs=["monthly", "yearly"],  # Frequencies that are evaluated
    main_freq="monthly",  # Frequency that is displayed when opening webpage
    periods=[
        "1995-2000"
    ],  # List of years or periods of years that are evaluated. E.g. "2005" or "2001-2020"
    obs_remove_outliers=False,
    model_remove_outliers=False,
    colocate_time=False,
    zeros_to_nan=False,
    weighted_stats=True,
    annual_stats_constrained=True,
  

    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
    from pyaerocom import const

    print(
        const.CACHEDIR
    )  # Prints where to find the caching folder. Not needed but this folder should be emptied now and then, so I like to see where it is

    stp = EvalSetup(**CFG)  # Makes a setup object from the dict, that PyAeroval can use
    ana = ExperimentProcessor(stp)  # Makes an experiment object
    res = ana.run()  # Runs the experiment  # Experiment Metadata
    exp_pi=exp_pi,
    proj_id=proj_id,
    exp_id=experiment_id,
    exp_name="DURF test evaluation",
    exp_descr=("Evaluation test DURF 10 year test simulations"),
    public=True,
)

    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
    from pyaerocom import const

    print(
        const.CACHEDIR
    )  # Prints where to find the caching folder. Not needed but this folder should be emptied now and then, so I like to see where it is

    stp = EvalSetup(**CFG)  # Makes a setup object from the dict, that PyAeroval can use
    ana = ExperimentProcessor(stp)  # Makes an experiment object
    res = ana.run()  # Runs the experiment# CFG['obs_cfg'] = {**OBS_SAT, **OBS_GROUNDBASED}

CFG['obs_cfg'] = {**OBS_GROUNDBASED}

CFG["model_cfg"] = MODELS

if __name__ == "__main__":


    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
    from pyaerocom import const

    print(
        const.CACHEDIR
    )  # Prints where to find the caching folder. Not needed but this folder should be emptied now and then, so I like to see where it is

    stp = EvalSetup(**CFG)  # Makes a setup object from the dict, that PyAeroval can use
    ana = ExperimentProcessor(stp)  # Makes an experiment object
    res = ana.run()  # Runs the experiment

Expected behavior
What pyaerocom should do is to read the obs data for the specified period. Calculate the climatology of specified frequency i.e. either monthly or yearly and it should be assign the same time axis as the model data.

Issues to start fixing:

  • helpers.calc_clim is always called with set_year = None. Worked for Jonas experiment since he used the year 2010 and "climatology" is defined between 2005 and 2015.
  • This test does not test for anything useful.

Actually this never properly implemented back to issue #51

Just from reading the old discussion. I do not think that we should have "fixed" period for climatology, but rather have a default one. Especially since 2005 - 2015 is almost 10 years ago and we have new and different observations now, which did not exist back them.