Refactor overture stuff to be more generic
cholmes opened this issue · 1 comments
The main overture commands could likely be done fairly generic for any large geospatial file. It'd be great to evolve them to at least be 'tools', and perhaps even be their own package that 'open_buildings' would call / depend on. The overall flow of how the data is formatted is:
- Add country_iso and quadkey columns to a directory of parquet files.
- Create a duckdb database from all the files (this isn't actually a CLI / python script yet, as it's super easy - just create table from reading in the whole directory)
- Write out individual parquet files based on country and iso, to the maximum size, with the appropriate rowgroup.
A more generic version of this would likely take input from more than parquet files (or at least have a command to convert to parquet files). And it would not be tied to the 'buildings' name.
Note that the google versions of the overture scripts are likely 95% of the way there to being completely generic. They use the centroid of the geometry instead of relying on the bbox struct. The main thing to do is probably not use 'buildings' as the table name.
Making this completely generic likely warrants putting it in a new repo.