OvertureMaps/data

schema mismatch in glob: column `"class"`

Closed this issue · 3 comments

This issue appears when i am trying to download buildings for certain area in Cairo/Egypt with bbox using duckdb , it's not a big deal for me because most/all of them null data, but i have to mention it for upcoming releases.

root@overturemaps:~# ./duckdb
v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SET memory_limit = '32GB';
D SET threads TO 16;
D SET enable_progress_bar = true;
D SET enable_progress_bar_print = true;
D INSTALL httpfs;
D INSTALL spatial;
D LOAD httpfs;
D LOAD spatial;
D 
D COPY (
>     SELECT
>         type,
>         version,
>         CAST(updatetime as varchar) as updateTime,
>         height,
>         numfloors as numFloors,
>         level,
>         class,
>         JSON(names) as names,
>         JSON(sources) as sources,
>         ST_GeomFromWKB(geometry) as geometry
>     FROM read_parquet('s3://overturemaps-us-west-2/release/2023-12-14-alpha.0/theme=buildings/type=*/*', hive_partitioning=1)
>     WHERE
>         bbox.minx > 31.26500 
>     AND bbox.maxx < 31.29643 
>     AND bbox.miny > 30.07066 
>     AND bbox.maxy < 30.10207
> ) TO 'egypt_cairo_hadaiq_el_qubbah_yharby_buildings.gpkg'
> WITH (FORMAT GDAL, DRIVER 'GPKG', SRS 'EPSG:4326');
100% ▕████████████████████████████████████████████████████████████▏ 
Error: IO Error: Failed to read file "s3://overturemaps-us-west-2/release/2023-12-14-alpha.0/theme=buildings/type=part/part-00000-a0ead583-abfd-4f33-969d-124a48bc3031-c000.zstd.parquet": schema mismatch in glob: column "class" was read from the original file "s3://overturemaps-us-west-2/release/2023-12-14-alpha.0/theme=buildings/type=building/part-00000-431912fa-aa4a-434d-9706-e2c921dffc76-c000.zstd.parquet", but could not be found in file "s3://overturemaps-us-west-2/release/2023-12-14-alpha.0/theme=buildings/type=part/part-00000-a0ead583-abfd-4f33-969d-124a48bc3031-c000.zstd.parquet".
Candidate names: id, geometry, bbox, names, version, updateTime, sources, height, numFloors, minHeight, facadeColor, facadeMaterial, roofMaterial, roofShape, roofDirection, roofOrientation, roofColor, eaveHeight, level, buildingId
If you are trying to read files with different schemas, try setting union_by_name=True

There is now a part type partition under buildings which is what's causing the different schemas. So you probably want:

...
FROM read_parquet('s3://overturemaps-us-west-2/release/2023-12-14-alpha.0/theme=buildings/type=building/*

or if you want both parts and the footprints (type=building) then try setting union_by_name=True in the read_parquet() call as it suggested in the message.

@Youssef-Harby Were you able to get past the error you were seeing?

Yes please close the issue, thank you @jwass