PABLO: UBSAN runtime_error Morton left shift by 16 places cannot be represented in type uint64_t

Question

PABLO: UBSAN runtime_error Morton left shift by 16 places cannot be represented in type uint64_t

Opened this issue 2 years ago · 2 comments

Getting different number of cells when running with march native in our code base so issue is a result from debugging that.

Running with:
-fsanitize=undefined, -fsanitize=bounds, -fsanitize=implicit-conversion, -fsanitize=integer, -fsanitize=nullability, -O3, -march=native

runtime error: left shift of 4222124650725375 by 16 places cannot be represented in type 'uint64_t' (aka 'unsigned long')

Answer 1 · 2022-06-10T07:06:00.000Z

Do you have a simple test case that I can run or the steps to reproduce the problem?

The only place we do a bitshift of 16 places is in the evaluation of the Morton number:

inline uint64_t splitBy3(uint32_t a)
{
    uint64_t x = a & 0x1fffff; // we only look at the first 21 bits
    x = (x | x << 32) & 0x001F00000000FFFF; // shift left 32 bits, OR with self, and 0000000000011111000000000000000000000000000000001111111111111111
    x = (x | x << 16) & 0x001F0000FF0000FF; // shift left 16 bits, OR with self, and 0000000000011111000000000000000011111111000000000000000011111111
    x = (x | x <<  8) & 0x100F00F00F00F00F; // shift left  8 bits, OR with self, and 0001000000001111000000001111000000001111000000001111000000000000
    x = (x | x <<  4) & 0x10C30C30C30C30C3; // shift left  4 bits, OR with self, and 0001000011000011000011000011000011000011000011000011000100000000
    x = (x | x <<  2) & 0x1249249249249249; // shift left  2 bits, OR with self, and 0001001001001001001001001001001001001001001001001001001001001001

    return x;
}

(See "https://www.forceflow.be/2013/10/07/morton-encodingdecoding-through-bit-interleaving-implementations/", section "Magic-bits").

Maybe applying a mask before doing the shift may fix the problem (I'm just guessing, I'm not good with bit operations):

inline uint64_t splitBy3(uint32_t a)
{
    uint64_t x = a & 0x1fffff; // we only look at the first 21 bits
    x = (x | (x & MASK_32) << 32) & 0x001F00000000FFFF; // shift left 32 bits, OR with self, and 0000000000011111000000000000000000000000000000001111111111111111
    x = (x | (x & MASK_16) << 16) & 0x001F0000FF0000FF; // shift left 16 bits, OR with self, and 0000000000011111000000000000000011111111000000000000000011111111
    x = (x | (x & MASK_8) <<  8) & 0x100F00F00F00F00F; // shift left  8 bits, OR with self, and 0001000000001111000000001111000000001111000000001111000000000000
    x = (x | (x & MASK_4) <<  4) & 0x10C30C30C30C30C3; // shift left  4 bits, OR with self, and 0001000011000011000011000011000011000011000011000011000100000000
    x = (x | (x & MASK_2) <<  2) & 0x1249249249249249; // shift left  2 bits, OR with self, and 0001001001001001001001001001001001001001001001001001001001001001

    return x;
}

where MASK_X are masks that zeros the last (on the left) X bits. @edoardolombardi Do you have any other idea?

Answer 2 · 2022-06-13T13:15:07.000Z

Hi everybody.
The algorithm in slitBy3 in not safe from the point of view of uint64_t overflow.
We could try to prevent the overflow of "1" digits by masking x right before the shift as Andrea proposed. I tested it with a unit32_t whose representation is 21 bits all set at "1", which is the max coordinate PABL0 allows. Trying with higher numbers would lead to this case because of the initialization of x (i.e. "a" masked to 21 digits) . The results with and without the new masking are in both cases the good interleaving.
In order to verify that the sanitaizer is good with it, I would ask @DMurataj01 to run the tool with this line
x = (x | (x & 0xFFFFFFFFFFFF) << 16) & 0x001F0000FF0000FF;
in place of this line
x = (x | x << 16) & 0x001F0000FF0000FF;
I would expect the tool to rise a runtime_error when it shifts by 8.
The algorithm is based on this overflow, but masking we "overflow" "0" in place of "1".
If the behaviour is what I depicted above, we could opportunely pre-mask x before shifting at every stage of the algorithm.
Thanks