Overhaul indexing, ideally switching to OSMExpress or something as an intermediate format
ellenhp opened this issue · 6 comments
The intermediate format I use right now thrashes SSDs and isn't even fast, so I'll probably either build something on top of redb
or build bindings for OSMExpress.
It wouldn't be too bad to reimplement an OSMExpress-like storage engine on top of redb if there's an S2 library out there, and you could also design from the start for compression and parallel query executions, which OSMExpress lacks as-is.
For now I'm just looking to replace the intermediate format I use between .osm.pbf and the final tantivy index, so I think I'm fine with using local storage for that part. It's a one-time thing and doesn't scale horizontally. I just need something that supports fast random access to resolve way/relation dependencies during indexing.
osmflat works really well for this locally, but does require about 160gb of memory for the planet for expansion. I'll probably try and get a cron job set up to upload those artifacts to R2 to make index generation a little more accessible. Leaving this open until I get it merged.
@jake-low just wrote this which might be interesting for airmail? https://lib.rs/crates/osmx