mapbox/geobuf

Configurable precision

mourner opened this issue · 7 comments

1e6 rounding is hardcoded in geobuf, but some datasets don't loose much with lower precision like 1e4. Perhaps we could make this configurable and also encoded as a property in the format to give more room for geometry compression.

Should get more relevant with delta encoding #23, because lower-precision data will have much lower deltas with configurable precision.

We would like to see this as an option too, within bounds. Overall using less precision makes sense to us for some of the data we are considering using this for.

Coupled with this, would it make sense to aggregate vertices together that after rounding are no longer different at this level of precision?

@brendan-ward probably not — we don't want to complicate schema with different-precision vertices and the use case with drastically different-precision data in the same protobuf is minimal. Or are you seeing a benefit in that?

@mourner Let me try differently, I might have been confused by something you proposed above:

  1. command line option to specify one of a couple precision options (e.g., 1e4 vs 1e6). That's what I meant by 'within bounds' (limited options)

  2. only one precision used in a given buffer, but variable between different buffers. One could be encoded with 1e4 whereas another 1e6.

  3. a property in the buffer that indicates the precision used. Could even be a bool if there are only two options. I thought this is what you mean by encoded as a property above.

@brendan-ward yes, that's correct. I was addressing your "aggregate vertices together" comment.

@mourner I meant that if the coordinates were unique at their original precision are no longer different when converted to this precision, we could gain efficiency by dropping the duplicate coordinates, e.g.,

[..., 104563, -408563, 104563, -408563, 104563, -408563, ...]
-->
[..., 104563, -408563, ...]

But perhaps that's not likely enough to happen in practice that it is worth dealing with.

@brendan-ward yeah, we can dedupe in encoding easily.

Closing in favor of #27