OvertureMaps/data

A small number of geometries are invalid

Opened this issue · 6 comments

While loading Overture data in Google Bigquery, we have identified some "invalid" geometries in the following tables (overture_maps_wkb_errors.zip):

  • admins.localityArea
  • buildings.building
  • buildings.part

These invalid geometries are "fixable", but by default, they produce "Invalid polygon loop" error. My recommendation is to remove these geometries, turn them valid if possible, and remove them if not.

Additionally, @jwass checked the same in Athena:

SELECT
    COUNT(1)
FROM overture_2024_01_17_alpha_0
WHERE
    NOT ST_ISVALID(ST_GEOMFROMBINARY(geometry))
-- 1247 invalid results

Thanks, @Jesus89. It's unfortunate these crash BigQuery but still is an issue on our side.

I also wanted to count by theme. I used a query similar to the Athena one above. Results:

buildings: 157
base: 1060
admins: 30

@jenningsanderson @DavidKarlas

The main question is:
Should geometries that are invalid - according to ST_IsValid()- be release-blocking? (Note: the geometries themselves are well-formatted but might have self-intersections, overlapping segments, etc.

My initial thought is yes. @ibnt1 - thoughts on this being included in the theme promotion pipeline? This puts responsibility on theme owners. And checking all geometries is not cheap and would probably have to be done in Spark otherwise it'll take forever.

Curious to hear what some others think (@mojodna / @varapmsft).

@johnaddresscloud / @mtravis - This was the issue you were seeing as well?

@jwass - i agree, i think we should add ST_IsValid() check as validation test at feed promotion and ask theme owners to fix/remove them to pass promotion.

@johnaddresscloud / @mtravis - This was the issue you were seeing as well?

@jwass John had the problem but I'm pretty certain that this the same issue.

@jwass @mtravis Yes, this is the same issue I reported, although I only saw 7 invalid geometries in the December release of buildings. As far as GDAL/GEOS is concerned, and therefore anything using those libraries, these geometries are valid, so I don't think it's a release blocker.

I have already reported the isssue in the BigQuery bug tracker: https://issuetracker.google.com/u/1/issues/316852027

Just an update here for @Jesus89. We made some changes to our internal process that should fix these soon for base anbd buildings (barring any disagreements about the definition of validity).

It's a bit more involved because we don't want to just filter invalid features at the output but rather put in a check at the beginning of the pipeline to ensure other validation checks will flag things appropriately. This likely won't be in for the upcoming February release next week, but should likely make it in for March.