data-apis/dataframe-api

[protocol] USE_BITMASK is ambiguous

pitrou opened this issue · 0 comments

pitrou commented

The description for USE_BITMASK does not specify in which order the bits of a byte are to be considered (from MSB to LSB or LSB to MSB).

FTR, Arrow goes from LSB to MSB, i.e. bit 0 representents the validity of the first array element, bit 1 the second element, and so on:

>>> a = pa.array([1, None, 2, 3])
>>> a.buffers()
[<pyarrow.Buffer address=0x7f3dc8209000 size=1 is_cpu=True is_mutable=True>,
 <pyarrow.Buffer address=0x7f3dc8209040 size=32 is_cpu=True is_mutable=True>]
# buffer 0 is the validity bitmap
>>> bin(a.buffers()[0][0])
'0b1101'