mapbox/geobuf

mapbox/pbf as an encoding/decoding alternative?

mourner opened this issue · 14 comments

As an alternative to Protobuf.js and protocol-buffers libraries, could we use Konstantin's pbf? It's much more low level and without proto reading, but also simpler and gives us more control over encoding and decoding, potentially making it faster.

pbf package got tons of improvements since this comment, and after I merge some recent performance improvements, performance will look like this:

decode vector tile with pbf x 473 ops/sec ±0.64% (94 runs sampled)
decode vector tile with protocol-buffers x 291 ops/sec ±1.08% (90 runs sampled)
native JSON.parse of the same data x 242 ops/sec ±0.70% (91 runs sampled)
encode vector tile with pbf x 175 ops/sec ±0.90% (84 runs sampled)
encode vector tile with protocol-buffers x 62.99 ops/sec ±0.84% (67 runs sampled)
native JSON.stringify x 211 ops/sec ±0.73% (92 runs sampled)

Given that with pbf, you don't need an intermediate JSON representation of protobuf data to read/write, geobuf performance using pbf will be stellar. cc @brendan-ward, ref #28

Looks like a real performance win to me. 👍
Especially combined with a tool like you've outlined here for generating the code from the proto, should make keeping updated with the latest proto easy.

I'm not at all wedded to the implementation in #17, I was just trying to help move this along.

@brendan-ward great! The perf improvements are in and the library should be stable now, so I'll take a stab at making a new geobuf implementation using pbf. Should be a very easy port from pygeobuf.

Decoding a sample 12MB zips geobuf (created from a 100MB GeoJSON) takes 1 minute with my pygeobuf Python decoder (based on official Protobuf Python bindings).

Wrote an initial pbf-based decoder implementation in JS today. The same task takes 2 seconds, which is 24% faster than JSON.parse of the same GeoJSON data. Wow. cc @mick @tmcw

@mourner Awesome! We need to help you add Cython to pygeobuf to speed that up too.

@brendan-ward low priority though — the python repo was more of a playground for prototyping than an encoder/decoder I'd want anyone to use in production. It served its purpose, now we need something nice and fast in JS :)

For comparison — on an equivalent zips pbf encoded with the current geobuf (which is 6 times bigger to be fair), the decoder freezes for a couple of minutes and then errors with this (seeing that error for the first time btw):

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

On smaller files which don't error out, the difference is not as dramatic but still nice: the Idaho test case decoding is 620ms for the current geobuf and 180ms for the new one.

mick commented

@mourner that's awesome! The smaller file case is great too. It will add up, because in cardboard we save features individually, loading a set of indexed features involves a lot of small decodes.

Encoding is 3-4 times faster as well — it takes 730ms to encode Idaho sample with the current geobuf and 200ms with the new one, including automatic precision and dimensions detection. For comparison, JSON.stringify of the same GeoJSON takes 625ms.

On the zips sample, it's 11.2s vs 2.4s to encode (excluding the JSON.parse step).

Fixed by #29

@mick 1.0.0 released, so it's ready for cardboard migration. Not sure if we should start looking into this now or wait until we announce the new Geobuf publicly and gather some feedback.

mick commented

@mourner awesome! I think we can wait a bit for feedback before migrating, but I'll start playing with this.