pik-primap/primap2

Re-using columns in `read_wide_csv_file_if` while reading data

Opened this issue · 1 comments

Describe the bug
Following up on the discussion in #82 read_wide_csv_file_if doesn't support re-using a source column like

Unfortunately, read_wide_csv_file_if doesn't support re-using a source column multiple times in coords_cols because behind the scenes it just does a renaming. Probably something it should support, so maybe worth opening a bug report - but I can't commit to when I will have time to fix it.

Example code:

file = "rcmip-emissions-annual-means-v5-1-0.csv"
coords_cols = {
    "unit": "Unit",
    "area": "Region",
    "model": "Model",
    "scenario": "Scenario",
    "entity": "Variable",
    "category": "Variable"
}
coords_defaults = {
    "source": "RCMIP",
}
coords_terminologies = {
    "area": "RCMIP",
    "category": "RCMIP",
}
coords_value_mapping = {
    "entity": map_variables
}
meta_data = {
    "rights": "CC BY 4.0 International",
}
data_if = pm2.pm2io.read_wide_csv_file_if(
    file,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    coords_value_mapping=coords_value_mapping,
    meta_data=meta_data,
    filter_keep={"f1": {
        "Model": "CEDS/UVA/GCP/PRIMAP",
    }}
)
data_if

Fails with KeyError.

Expected behavior

Allow re-using a column when reading the data.

Potential workaround is described in #82

Yes, that would indeed make sense. I currently copy the column before reading the data. I only needed it for less important columns like the category name in the original data (before mapping to e.g. IPCC2006 terminologies). But I realize that it's necessary for all the IIASA database type data and thus we should add it soon. I'll assign the issue to me, but can't promise I'll implement it i the next weeks.