osm-without-borders/cosmogony

Need help with mapping country regions for COVID tracking

Opened this issue · 8 comments

Hi, I'm a contributor to coronadatascraper, an open source project aiming to scrape official websites all around the world for COVID numbers.

My problem is that I'm trying to find a system which we can use to define different hierarchies within countries. I created the country-levels project from Natural Earth dataset, but I'm not happy with it as for example it contains bad Admin 1 divisions in Spain.

Can you help us with your experience? My aim is to make a short code based system, like hasc:ES.CL or something similar which we can use to refer to a region. We'd need to have a GeoJSON for each region + a Wikidata link for population fetching. Is this possible somehow with your project?

Here is a relevant issue which I've opened, if you could contribute to the discussion it'd be great!
https://github.com/lazd/coronadatascraper/issues/286

Hi,

nice project!

Cosmogony might indeed help you as it creates some hierachies (country, country region, state, state district, ...) using OSM data.

I'm not sure I understand what you are trying to do, but maybe you can either select a hierarchy level (maybe country region ?) or select the first level below country (since sometime a country can have no country regions, only states or even cities).

Since the wikidata id is often filled in OSM, you can easily have the population and other metadata (I don't know the proportion of zone with wikidata id though).

For your id, maybe you can use ISO 3166-2?

It would be easier for you to use an already generated cosmogony file, but I don't have an up to date readily available. Maybe @amatissart or @prhod can help you have a cosmogony file?

I have just uploaded an extract from the cosmogony dataset generated from the planet-200302.osm.pbf file:
https://github.com/osm-without-borders/cosmogony/releases/download/v0.7.3/cosmogony-2020-03-02-regions.jsonl.gz

This file contains all extracted regions with type "country", "country_region", "state" or "state_district". This classification is mostly built for geocoding purposes, and it may or may not fit your needs.

The file is a .jsonl file (with one JSON object per line, representing a zone).
For each zone you'll notably find:

  • id (integer)
  • osm_id
  • zone_type (among "country", "country_region", "state", "state_district")
  • geometry (GeoJSON, directly extracted from OSM, with NO simplification)
  • tags (key-value from OSM, including wikidata and ISO8166-2 if present)
  • parent (the id of the parent zone in the hierarchy)

Thanks so much for the help! I was able to process and simplify the dataset, so I can view it properly.

My biggest question:

  • How is it possible to make the country borders not include sea? Like on wambachers, there is the land/sea switch.
  • Is there any way to substitute in missing countries? Like in Africa for example?

How is it possible to make the country borders not include sea? Like on wambachers, there is the land/sea switch.

Country relations in OSM typically include maritime boundaries. As far as I know, there is no simple way to extract country land boundaries directly from OSM. A solution would be to clip the polygons at the end of the process, for example by using global land or global water polygons available on https://osmdata.openstreetmap.de/data/

Is there any way to substitute in missing countries? Like in Africa for example?

Some countries may be missing in the dataset if their polygon was broken in OSM at the time of the extract. Unfortunately it happens from time to time, and that's indeed the case with the planet file we used (dated 2020-03-02). An updated dataset processed with more recent OSM data would hopefully solve that problem.

I see. Is there any way to get the OSM IDs, without the polygons? I found out I can download the polygons from Wambachers with the water cut out, I'd just need to have the IDs listed.

Do you mean the OSM IDs of all regions, including those with invalid polygons in OSM ? I fear that is out of the scope of the current implementation: Cosmogony is using the exact geometry of each region to build the hierarchy of zones, and determine a zone_type from this hierarchy.

I see. Maybe I can use the valid polygons from Cosmology and fill up the rest from Wambacher, I'll try.

FYI I have added updated datasets to the latest release (generated with planet-200316.osm.pbf)

All zones:
https://github.com/osm-without-borders/cosmogony/releases/download/v0.7.3/cosmogony-2020-03-16.jsonl.gz

Only "country", "country_region", "state", "state_district":
https://github.com/osm-without-borders/cosmogony/releases/download/v0.7.3/cosmogony-2020-03-02-regions.jsonl.gz

(The "international_labels" field includes only the english version, hence the slightly smaller file size).