ellenhp/airmail

Overhaul indexing, ideally switching to OSMExpress or something as an intermediate format

ellenhp opened this issue · 6 comments

The intermediate format I use right now thrashes SSDs and isn't even fast, so I'll probably either build something on top of redb or build bindings for OSMExpress.

It wouldn't be too bad to reimplement an OSMExpress-like storage engine on top of redb if there's an S2 library out there, and you could also design from the start for compression and parallel query executions, which OSMExpress lacks as-is.

For now I'm just looking to replace the intermediate format I use between .osm.pbf and the final tantivy index, so I think I'm fine with using local storage for that part. It's a one-time thing and doesn't scale horizontally. I just need something that supports fast random access to resolve way/relation dependencies during indexing.

osmflat works really well for this locally, but does require about 160gb of memory for the planet for expansion. I'll probably try and get a cron job set up to upload those artifacts to R2 to make index generation a little more accessible. Leaving this open until I get it merged.

Merged in #10

bdon commented

@jake-low just wrote this which might be interesting for airmail? https://lib.rs/crates/osmx

@bdon Thanks for the tip! Jake is an old coworker so we met up at a coffee shop over the weekend and caught up. I ended up switching to osmx-rs for indexing, just pushed that change. :)