owid/etl

Bug in `add_regions_to_table` when using `countries_that_must_have_data` (in an unusual situation)

Opened this issue · 0 comments

Problem

In an unusual situation for aggregation, World can have a value, even though Asia has no value, since China has no value.

Specific example

I noticed this error while working in minerals, because some aggregates (e.g. High-income countries) had larger values than the World.

In the following situation:

REGIONS = {**geo.REGIONS, **{"World": {}}}
tb = geo.add_regions_to_table(
    tb=tb,
    regions=REGIONS,
    ds_regions=ds_regions,
    ds_income_groups=ds_income_groups,
    countries_that_must_have_data={
        "Asia": ["China"],
        "World": ["Asia"],
    },
)
  • China does not have data, so Asia does not have data
  • World does have data, even though Asia does not have data

Expected behaviour

If Asia does not have data, then World should not have data.

Technical notes

  • This issue may be tricky to fix. At least, we could raise a warning.
  • We should write a unit test for this, and then ideally fix it
    • ...but fixing it could potentially mean changes for a large number of datasets, so we would need to increment the EPOCH and check the diffs of the output
    • ...ideally we would only change behaviour for steps that use countries_that_must_have_data