Re-using columns in `read_wide_csv_file_if` while reading data
Opened this issue · 1 comments
Describe the bug
Following up on the discussion in #82 read_wide_csv_file_if
doesn't support re-using a source column like
Unfortunately, read_wide_csv_file_if doesn't support re-using a source column multiple times in coords_cols because behind the scenes it just does a renaming. Probably something it should support, so maybe worth opening a bug report - but I can't commit to when I will have time to fix it.
Example code:
file = "rcmip-emissions-annual-means-v5-1-0.csv"
coords_cols = {
"unit": "Unit",
"area": "Region",
"model": "Model",
"scenario": "Scenario",
"entity": "Variable",
"category": "Variable"
}
coords_defaults = {
"source": "RCMIP",
}
coords_terminologies = {
"area": "RCMIP",
"category": "RCMIP",
}
coords_value_mapping = {
"entity": map_variables
}
meta_data = {
"rights": "CC BY 4.0 International",
}
data_if = pm2.pm2io.read_wide_csv_file_if(
file,
coords_cols=coords_cols,
coords_defaults=coords_defaults,
coords_terminologies=coords_terminologies,
coords_value_mapping=coords_value_mapping,
meta_data=meta_data,
filter_keep={"f1": {
"Model": "CEDS/UVA/GCP/PRIMAP",
}}
)
data_if
Fails with KeyError
.
Expected behavior
Allow re-using a column when reading the data.
Potential workaround is described in #82
Yes, that would indeed make sense. I currently copy the column before reading the data. I only needed it for less important columns like the category name in the original data (before mapping to e.g. IPCC2006 terminologies). But I realize that it's necessary for all the IIASA database type data and thus we should add it soon. I'll assign the issue to me, but can't promise I'll implement it i the next weeks.