Possible undefined behavior for `arm_nn_requantize` when compiling with `CMSIS_NN_USE_SINGLE_ROUNDING`

Question

Possible undefined behavior for `arm_nn_requantize` when compiling with `CMSIS_NN_USE_SINGLE_ROUNDING`

lforget-ec opened this issue 8 months ago · 9 comments

In the CMSIS_NN_USE_SINGLE_ROUNDING ifdef branch, there is UB when shift >= 31 as total_shift-1 becomes negative.
This can make sense in some context when using a call to requantize to also left_align some quantized int8 value.

__STATIC_FORCEINLINE int32_t arm_nn_requantize(const int32_t val, const int32_t multiplier, const int32_t shift)
{
#ifdef CMSIS_NN_USE_SINGLE_ROUNDING
    const int64_t total_shift = 31 - shift;
    const int64_t new_val = val * (int64_t)multiplier;

    int32_t result = new_val >> (total_shift - 1); // <-- Here is the problematic line
    result = (result + 1) >> 1;

    return result;
#else
    return arm_nn_divide_by_power_of_two(arm_nn_doubling_high_mult_no_sat(val * (1 << LEFT_SHIFT(shift)), multiplier),
                                         RIGHT_SHIFT(shift));
#endif
}

Answer 1 · 2024-02-29T09:25:28.000Z

Thanks for your interest in CMSIS-NN. We shall update the header description for that as part of #108.
Are you suggesting a similar check as here https://github.com/tensorflow/tensorflow/blob/451a94246d83d171b446141ec40569f6a571f7df/tensorflow/lite/kernels/internal/common.cc#L25 ?

Answer 2 · 2024-03-01T01:39:48.000Z

The use case I was thinking of is when I know the multiplier is 24 bits (e.g. comes from a binary32 floating point).
Then the product fits on 32 bits and I don't want rounding at all.

Conceptually it could be done using a shift value of 31: product would be "left shifted by one" before adding the rounding bit and then right shifted by one.

I can also "pre shift" my multiplier to obtain the same result with a shift value of 30, so no problem of enforcing shift values lower than 30, but then it is not consistent with the #else branch which computes a sensible result for shift = 31.

Answer 3 · 2024-03-01T04:45:50.000Z

Actually I think the shift parameter does not have the same signification depending on whether or not the function is compiled with CMSIS_NN_USE_SINGLE_ROUNDING.

In the "single rounding" case, the biased shift applies to the result which is computed exactly.
In the other case, the input is shifted before performing the product.

As a result it is, if I am not mistaken, actually impossible to access the product low bits in the second case.

I don't know if this is intended, but that is rather counterintuitive and not well documented.

Answer 4 · 2024-03-04T11:53:33.000Z

You mentioned in the other case (double rounding case) that the input is shifted before performing the product.
That is only done if shift is positive. Also for the same case the divide by (2 ^ abs(shift)) is only done if shift is negative. Perhaps documentation could be made more clear about that.
Also to be clear the output may be slightly different between the two branches and that is intentional.
Furthermore the intention is that both branches have the same numerical behavior as for tensorflow as stated here.

Answer 5 · 2024-03-05T00:11:51.000Z

I expected the result to vary slightly, but not that much.

Let's imagine my multiplier value is 2.
Product bits of interest are on bits [1-32].
To "cancel" the implicit 31 bit right shift I need to set shift to 30.
But because the val * (1 << LEFT_SHIFT(shift)) results in an int32_t, in the double rounding case I will lose 30 bits of value before computing the product, while in the single rounding case it will be OK.

Of course I can change my multiplier to 1 << 30 and have a shift of only 1 to get the same result, and it will work in the two cases.

If this is expected it should probably be documented (I don't see it on the tensorflow int8 quantization specification either).
The USE_SINGLE_ROUNDING macro name is in this case a bit misleading.

Something like EXACT_PROD_BEFORE_REQUANTIZE or EXTENDED_PROD_BEFORE_REQUANTIZE would make the difference more clear in my opinion.

Answer 6 · 2024-03-05T18:23:21.000Z

So now you want to change name of the macro CMSIS_NN_USE_SINGLE_ROUNDING to something like EXACT_PROD_BEFORE_REQUANTIZE or EXTENDED_PROD_BEFORE_REQUANTIZE ?
Would it not be better to keep a similar name as in tensorflow ?

Answer 7 · 2024-03-07T01:34:07.000Z

I think it depends on what is the goal of the library: if the end purpose is to be an implementation backend for tensorflow, then sticking to their naming even if they are misleading makes sense.

If the goal is to provide library that can be used directly outside from tensorflow, then I think it makes sense to detach from it.

Answer 8 · 2024-03-08T07:56:08.000Z

Everyone is free to use CMSIS-NN in their project. However it is currently only bit-exact to TensorFlow reference kernels. If defining CMSIS_NN_USE_SINGLE_ROUNDING it is no longer bit-exact unless also specifying TFLITE_SINGLE_ROUNDING in tensorflow.
The name of the flag does not prevent CMSIS-NN to be used by a project..

Answer 9 · 2024-03-08T09:32:21.000Z

I am not saying that the name of the flag is preventing anyone from using the library.
Just that it is rather misleading, as in practice (for the requantize case, I haven't checked other usages) the value difference can be way bigger than just double rounding issues.
As documentation is not very clear on it, user needs to know that actually this name does not refers to what it does, but is a reference to some Tensorflow internal define.

However it is currently only bit-exact to TensorFlow reference kernels.

Based on that it seems that it is really tied to tensorflow.
A less tied library might rather aim at numerical accuracy, not specifically bitwise equality.

Thank you for the clarifications!