OvertureMaps/data

Issues extracting complete level 6 administrative boundaries globally

Opened this issue · 6 comments

Issue

I have been trying to use a number of different data sources to extract clean and consistent layer of administrative boundaries for counties (not just countries) globally (typically referred to as level 2 by other sources, but levels 4-6+ for OSM and Overture). My task is to overlay Point geometries for airports, ocean ports, roads, and other critical infrastructure with the administrative boundaries using spatial intersection contains/within predicates. The reason the Overture Maps dataset is most appealing is because of the fact the boundaries extend into national territorial waters and not ending at the edge of a land mass, which will make spatial joins with port or other points of interest not on land to specific administrative units significantly easier and reproducible.

First, I had been using GADM, but ran into issues where multiple boundaries are missing, like Dammam SAU or Sakaiminato JPN. This issue was consistent across both the single global GPKG file as well as individual country GeoJSON files. Then I switched to trying OSM and by extension Overture, but the file conversion from .osm.pbf to SQLite is proving difficult, and as documented below, Overture is missing many of the boundaries at admin level 6 or greater. The issue with other datasets I've seen is as follows:

  • GADM: as mentioned above, sometimes missing whole geometries for administrative units
  • geoBoundaries: complete but has a much smaller vertex count than OSM
  • OSM-Boundaries: does not provide a user friendly way of download their entire dataset (e.g. over API or single file)
  • MapBox: paid commercial solution, have not purchased a license yet

Based on these issues I've encountered I want to check with the Overture developers and community if there's a better way to extract complete ADM 2 (aka adminLevel 6+ for OSM/Overture) for the entire globe, or even more granular into municipalities if possible. I have attached examples of the issues I've encountered using Japan just as an example, and shared the code I'm using to extract administrative data from Overture below:

Admin Data Comparisons

Overture Maps adminLevel==4:

image

Overture Maps adminLevel==6:

image

Overture Maps adminLevel==10:

image

MapBox:

MapBox Admin Level 2
image

OSM:

OSM Admin Level 2 - Preprocessed by geoBoundaires
OSM Admin Level 2 - Preprocessed OSM-Boundaries
image

geoBoundaries:

geoBoundaries Admin Level 2
image

GADM:

GADM Admin Level 2
image
image

Overture Admin Boundary Extraction Code

import geopandas as gpd
import pandas as pd
from shapely import wkb
import json
import duckdb

# Instantiate DuckDB server
conn = duckdb.connect(database=':memory:', read_only=False)

# Add the spatial extension
conn.execute("""
INSTALL spatial;
LOAD spatial;
""")

# Add the httpfs extension
conn.execute("""
INSTALL httpfs;
LOAD httpfs;
""")

# Add the Azure extension and connection string
conn.execute("""
INSTALL azure;
LOAD azure;
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';
""")

# SQL query adopted from https://github.com/OvertureMaps/data?tab=readme-ov-file#3-duckdb-sql
admin_area_query = """
WITH admins AS(
    SELECT id AS adminId,
        isoCountryCodeAlpha2,
        isoSubCountryCode,
        subType,
        localityType,
        adminLevel,
        JSON(names) AS names
    FROM admins_view
    WHERE adminLevel >= 2 and adminLevel <= 10
),
areas AS (
    SELECT id as areaId,
        localityId,
        geometry
    FROM admins_view
)

SELECT admins.adminId,
    admins.adminLevel,
    admins.subType,
    admins.localityType,
    admins.isoCountryCodeAlpha2,
    admins.isoSubCountryCode,
    admins.names,
    areas.areaId,
    areas.geometry
FROM admins
INNER JOIN areas ON admins.adminId = areas.localityId;
"""

# Execute the query on Overture Maps on Azure via DuckDB
admin_areas = conn.execute(admin_area_query).fetchall()

# Store results as a GeoDataFrame after loading WKB geometry
admin_areas_df = pd.DataFrame(
    admin_areas,
    columns=[
        'adminId',
        'adminLevel',
        'subType',
        'localityType',
        'isoCountryCodeAlpha2',
        'isoSubCountryCode',
        'names',
        'areaId',
        'geometry'
    ]
)
admin_areas_df['geometry'] = admin_areas_df['geometry'].apply(lambda x: wkb.loads(x))
admin_area_gdf = gpd.GeoDataFrame(admin_areas_df, geometry='geometry')

# Spatial join of Japan country outline with Japanese adminLevel 6 geometries
jp_overture_admin_6_data = gpd.sjoin(admin_area_gdf[admin_area_gdf['adminLevel']==6], admin_area_gdf[admin_area_gdf['isoCountryCodeAlpha2']=='JP'], how='inner', predicate='within').to_json()

# Plot the administrative boundaries with Folium
jp_m = folium.Map(location=[35.1470821, 136.8437395], zoom_start=5)
folium.GeoJson(
    jp_overture_admin_6_data, 
    style_function=admin_6_style
).add_to(jp_m)
jp_m

Overture team (@ibnt1 @mojodna @jwass),

To add explanation to this issue using the example above, and some other countries below, I want to highlight it appears that potentially a large quantity of level 6 administrative boundaries are missing from the overture dataset, and/or unable to query using the same method above (joining admins.adminId to areas.localityId as provided by the example in the Overture Data repo).

USA Missing a Borough in Alaska:

image

Great Britain is Missing a Few Counties:
image

Japan Counties are Half Missing:
image

Thailand Counties are Mostly Missing:
image

Turkey Counties are Completely Missing:
image

Saudi Arabia Counties are Completely Missing:
image

I don't have the time to audit every country in the world, but clearly boundaries are missing in the translation from OSM.

Hi @ksmithNau - thanks for your thorough exploration and writeup.

I think your description of the friction of many data sources is spot on and something Overture wishes to solve. We definitely want user-friendly ways of retrieval and querying of the dataset - including admin boundaries.

I haven't looked into specifics yet, but can you provide a few OSM admin areas (relation IDs) that are in OSM and you are looking for, but are not in the last Overture release? We'll take a look too but might be easier to point to a few specific ones to understand how/why they might be missing.

Hi @jwass thanks for your response,

Upon further digging, it does appear that the missing admin level 6 polygons/multipolygons are indeed missing from OSM as well. I had been looking at the admin comparison tool at geoBoundaries and their OSM source was initially showing the county boundaries in the countries missing. I'm not sure how they were patching those in. I'm also not sure if the linestrings or objects are in OSM they just aren't tagged with admin level 6 properties, or they just don't exist. Long story short, this does not appear to be an issue with Overture, and instead upstream with OSM.

OSM Admin Boundary Extract at Admin Level 6
image

geoBoundaries Depiction of ADM2 (source: OSM-Boundaries)
image

We do plan to use GeoBoundaries as source in countries where OSM coverage is not complete, or in some cases fix OSM admins. We are planning to make new release soon, which will hopefully fix most of issues you reported, thank you for testing our data!

@DavidKarlas I'm curious how you plan to merge GeoBoundaries with OSM. The polygons for GeoBoundaries have significantly less vertices than OSM, and they don't seem to align well. Also I've discovered that even between ADM 1 and 2 for GeoBoundaries don't align. In other words, a state/province boundary does not perfectly align with the underlying counties/districts, so performing different types of spatial predicates have different results (e.g. within/contains vs. intersection). Thanks for the update though, excited for when Overture can be the single consolidated source of truth for our workflow.

Nangarhar Province, Afghanistan AMD1 vs. ADM2
geoBoundaries AMD1 & ADM2 in Afghanistan

@DavidKarlas wonder if I'm answering my own question, it appears the geoBoundaries ADM2 files for individual countries are higher vertex count, sub-boundaries align, and don't match the global AMD2 download file (GPKG).

Adding an example over Tokyo...

Blue polygon and outlines are the geoBoundaries individual country file for Japan, while the red outlines are the global ADM2 file from geoBoundaries:
image