mapbox/geojson-vt

Import/export index via protobuf blob.

METACEO opened this issue · 5 comments

Story: I index a large collection of features and I want to later use this index without having to build it again.

I am indexing enough features where my process takes a few minutes to complete (and my process' thread is blocked.) How possible would it be to export the index (using the pbf package) so that I could later import it without the expensive computation that the indexing requires?

@mourner - is this an enhancement another external developer (myself) could approach?

@METACEO Have you considered geojson2mvt. You can also do the same directly in your code with vt-pbf

Both look like they render/output individual tiles (geojson2mvt to files in folders and vt-pbf to pbf files per tile) I'm hoping to pull the computed index (the entire projected pyramid) into a single blob that I can then later read and inject into a new geojson-vt instance.

I imagine (limited to my observations) that a lot of the computing that occurs during indexing is the clipping, transformation, bounds checking, etc and when I'm working with large data sets it can take up to 5 minutes to index (and a lot of memory.) If I could do this indexing once (or whenever my data changes) I'd rather download the index and spend whatever extra time over the network rather than spending time and memory reprocessing the index (processing the index also blocks the process' thread so health checks fail and I have no way of knowing when an instance has completed or is ready to query.)

How possible would it be to export the index (using the pbf package) so that I could later import it without the expensive computation that the indexing requires?

I think this would be too much effort for little gain. The index computation is already cheap enough compared to the overhead of loading a corresponding amount data in the browser in the first place. If it's done off the main thread (in a web worker), it's fast enough. And to mitigate the cost of transferring/parsing GeoJSON, you could try https://github.com/mapbox/geobuf.

If you're still up for the challenge, I'm sure this would look great as a separate module.

Ok, I should clarify:

  • This is server-side.
  • I am utilizing geobuf between services/clients.

Extra information: I have about a gigabyte of geobuf-encoded data and when indexing these data sets my memory spikes up to ~7-9 gigs before normalizing down to a few (2-4) after the indexing/computation has completed.

The computation and the spike is what is killing my back-end process. The back-end services get terminated because the indexing has locked the thread and they cannot resolve health checks (simple HTTP GET calls) and the memory spike gets expensive with other local processes.

I can work with and justify the normalized index and the memory is consumes, but the computation and its' memory usage and its' thread locking is getting me in trouble.

While downloading the index would be heavy too, I accept that, it should allow me a smaller memory footprint and a process thread that is not blocked when downloading chunks of the exported index.

Wow, that's a lot of data. Haven't anticipated geojson-vt being used in such a context. Very cool!

Anyway, I think the scope of a feature like this is not small, and given this being a rare use case (because on such amounts of data, usually people prefer native solutions and full pre-tiling such as Tippecanoe, or a Mapnik + MVT stack), I'd prefer this to be an external module in a separate repo.