Computing convolution with asymmetric quantization

Question

Computing convolution with asymmetric quantization

Opened this issue 3 years ago · 0 comments

Apologies if this is not the right forum to ask this, but I am having some difficulty understanding how the convolution kernel with asymmetric quantization works. As an example, suppose that I have a 1x1 input with a value of 3 and a 1x1 kernel with a value of 5. I take the scale of the quantization to be 1 and the zero point to be 0. The quantized values should therefore match the real values and the result should be 15.

Here is some code to demonstrate how I have tried calling the xa_nn_conv2d_std_asym8xasym8 function:

uint8_t output[1] = {0};
uint8_t input[1] = {0};
uint8_t kernel[4] = {0};  // Pad the kernel so that it is divisible by 4.
int bias[1] = {0};
uint8_t scratch[100];

input[0] = 3;
kernel[0] = 5;

xa_nn_conv2d_std_asym8xasym8(
    output,
    input,
    kernel,
    bias,
    1,  // input_height
    1,  // input_width
    1,  // input_channels
    1,  // kernel_height
    1,  // kernel_width
    1,  // out_channels
    1,  // x_stride
    1,  // y_stride
    0,  // x_padding
    0,  // y_padding
    1,  // out_height
    1,  // out_width
    0,  // input_zero_bias
    0,  // kernel_zero_bias
    1,  // out_multiplier
    0,  // out_shift
    0,  // out_zero_bias
    0,  // out_data_format
    (void *)scratch);
printf("Output: %d", output[0]);

This results in:

Output: 0

I notice that if I set out_shift to 27 the output becomes 1, but it is zero for every other valid choice of out_shift. I might expect to need to shift the output by 24 bits since (as I understand it) the accumulator has 32 bits and gets rounded back down to 8 bits, but this still results in an output of 0. How would I correctly set these parameters to see an output of 15?

EDIT: I have also tried setting a much larger out_multiplier on the theory that it may be a 32-bit fixed point number in the range of (0, 1). But if I set it to 2^31 I continue to get 0 unless I set out_shift to 28 or greater, in which case I get an output value of 255.