Bug in `add_regions_to_table` when using `countries_that_must_have_data` (in an unusual situation)
Opened this issue · 0 comments
pabloarosado commented
Problem
In an unusual situation for aggregation, World
can have a value, even though Asia
has no value, since China
has no value.
Specific example
I noticed this error while working in minerals, because some aggregates (e.g. High-income countries) had larger values than the World.
In the following situation:
REGIONS = {**geo.REGIONS, **{"World": {}}}
tb = geo.add_regions_to_table(
tb=tb,
regions=REGIONS,
ds_regions=ds_regions,
ds_income_groups=ds_income_groups,
countries_that_must_have_data={
"Asia": ["China"],
"World": ["Asia"],
},
)
China
does not have data, soAsia
does not have dataWorld
does have data, even thoughAsia
does not have data
Expected behaviour
If Asia
does not have data, then World
should not have data.
Technical notes
- This issue may be tricky to fix. At least, we could raise a warning.
- We should write a unit test for this, and then ideally fix it
- ...but fixing it could potentially mean changes for a large number of datasets, so we would need to increment the EPOCH and check the diffs of the output
- ...ideally we would only change behaviour for steps that use
countries_that_must_have_data