apache/parquet-format

Clarify behavior of DELTA_BINARY_PACKED encoders/decoders

asfimport opened this issue · 1 comments

I brought this issue up on some time ago on the mailing list [1]; in short I would like to add some clarification to the DELTA_BINARY_PACKED section of Encodings.md.  The issue is that while the specification does not limit the number of bits that can be used to encode deltas, some readers expect a maximum of 32 bits for INT32 data, and 64 bits for INT64 data [2]. I propose adding verbiage to the specification to the effect that while using 33 bits to encode INT32 data (or 65 for INT64), it is not recommended, and that readers should be able to read such data, but are not required to.

 

 

[1] https://lists.apache.org/thread/2wj88oghc0t6qqj8ojp5p5tf8wg11840

[2] apache/arrow#20374

Reporter: Edward Seidl / @etseidl
Assignee: Edward Seidl / @etseidl

PRs and other links:

Note: This issue was originally created as PARQUET-2435. Please see the migration documentation for further details.

Antoine Pitrou / @pitrou:
Resolved by linked PR.