tensorflow/tflite-micro

tflite-micro model predictions differ to tflite model predictions

Closed this issue · 7 comments

Hi!

I am working on a model based on TCN architecture. I trained the model and then, I quantized and converted it with TensorFlow Lite Converter, applying FULL INTEGER ONLY QUANTIZATION. So from input tensor to output tensor is all elements 8-bits quantized. Finally I converted TFLite model to C array with xxd.

I am using Arduino Nano 33 BLE Sense, and I wrote a firmware code for running the model with this board. For evaluation, I was interested in testing dataset inference. The input tensor has a lenght of 1000. So I wrote a Python script which takes a testing dataset sample and send it to the Arduino Nano by serial port communication using PySerial.

I took same considerations:

  • Using TensorFlow Lite model and TensorFlow Lite Interpreter, I loaded model input parameters, such input_zero_point and input_scale, and I quantized the dataset, form float to int8_t. Of course, then TFLM model does not quantized the input before invoke, because the sample is already quantized.
  • The samples are sent by batches of 50 elements, 50 bytes. So 20 batches are used in total.
  • A simple checksum system was implemented to verify communication is successful in each batch.
  • After the entire tensor was sent, from firmware side, the TFLM input tensor is read and sent again to python script, where is received and compared with the original sample. And this is working well too.

The problem is that the prediction result made with TFLM model in the microcontroller differs with the prediction of the TFLite model used with TensorFlow Lite Interpreter. The output tensor is just one neuron.

I have checked the communication process and I am confident that sample are load correctly in the input tensor. So, where the problem could be?? Have you got any suggestion??

I am using Arduino IDE. The TensorFlow_Lite libraary was built from source last year using tensorflow 2.5. And finally, I am using static tflite::AllOpsResolver resolver;

Thanks in advance!

Martin

@petewarden do you have any suggestion? Thanks in advance!

When using the TFLite interpreter on x86 are you setting experimental_op_resolver_type=tf.lite.experimental.OpResolverType.BUILTIN_REF to use the reference kernels to check results against TFLM?

I also found this issue with an unfolded transformer model. The first 5 rows of inference result from TFLITE python kernel is:

000: -0x7e,-0x7f,-0x7f,-0x80,-0x17,-0x7d,-0x7d,-0x5e,-0x80,-0x7e,-0x7f,-0x7f,-0x7f,-0x80,-0x7a,-0x7f,-0x80,-0x7f,-0x80,-0x7e,-0x7e,-0x6a,-0x79,-0x7e,-0x80,-0x7e,-0x7a,-0x7f,-0x7d,-0x69,-0x80,-0x80,-0x80,-0x7f,-0x7f,-0x7e,-0x7c,-0x7f,-0x7f,-0x7c,-0x7b, argmax: 4  
001: -0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7f,-0x80,-0x7e,-0x80,-0x80,-0x80,-0x80,-0x7f,-0x80,-0x7f,0x76,-0x80,-0x7f,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80, argmax: 29  
002: -0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7d,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7f,-0x80,-0x80,-0x80,-0x80,-0x7e,-0x7f,-0x7d,-0x7f,-0x7f,-0x80,-0x7f,-0x7c,-0x7f,-0x7d,0x6a,-0x80,-0x7f,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80, argmax: 29  
003: -0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7f,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7f,-0x80,-0x7e,-0x80,-0x80,-0x80,-0x80,-0x7e,-0x80,-0x7f,0x75,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80, argmax: 29  
004: -0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x7e,-0x7f,-0x80,-0x80,-0x7f,-0x7d,-0x80,-0x7e,0x71,-0x7f,-0x7f,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80,-0x80, argmax: 29  

The first 5 rows of inference output from TFlite Micro is:

000: -0x80 -0x80 -0x7f -0x80 -0x10 -0x7f -0x7e -0x6a -0x80 -0x7f -0x7f -0x80 -0x7f -0x80 -0x7c -0x80 -0x80 -0x7f -0x80 -0x7b -0x7d -0x5a -0x7d -0x7e -0x80 -0x7e -0x7a -0x7f -0x7d -0x5e -0x80 -0x80 -0x80 -0x7f -0x7f -0x7f -0x7e -0x7f -0x7f -0x7e -0x7e  argmax: 4  
001: -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7d -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f -0x7f -0x7e -0x80 -0x80 -0x80 -0x7f -0x7d -0x80 -0x7f  0x6d -0x7f -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  argmax: 29  
002: -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  0x52 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f -0x7e -0x80 -0x7f -0x80 -0x7f -0x80 -0x80 -0x7d -0x80 -0x7e -0x60 -0x7f -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  argmax: 8  
003: -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  0x2b -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7e -0x80 -0x7f -0x80 -0x7f -0x80 -0x7f -0x7f -0x80 -0x7d -0x38 -0x7f -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  argmax: 8  
004: -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f  0x3c -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f -0x80 -0x7f -0x80 -0x7e -0x80 -0x7f -0x7e -0x7f -0x76 -0x51 -0x7f -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  argmax: 8  
005: -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  0x74 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x7f -0x80 -0x7f -0x78 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80 -0x80  argmax: 8  

Hi, @Burton2000 and @solderzzc!

@Burton2000, I apologize for not responding anymore.
@solderzzc, I'm glad to hear that someone else has the same problem.

@Burton2000, I couldn't test using the TFLite interpreter on x86. At that moment, I was really busy finishing my thesis where I was using tinyML, and afterward, I finished my thesis, decided to rest, and I didn't return to work with this issue.

I remember I made a quick change, like changing a parameter in the process of conversion but I don't remember well now. I had to review this. But it didn't solve the issue.

Hi, @focus-martin
Thanks for sharing details.

It seems a transpose layer output has -1 as zero point caused this issue:
image

When zero_point = -1, following code will raise exception, if comment out zero_point == 0 check, the inference will run, but prediction result is wrong.

    data->input_zero_point = input->params.zero_point;
    // Filter weights will always be symmetric quantized since we only support
    // int8 quantization. See
    // https://github.com/tensorflow/tensorflow/issues/44912 for additional
    // context.

    TFLITE_DCHECK(filter->params.zero_point == 0);
    if (filter->params.zero_point != 0){
      printf("(filter->params.zero_point != 0)\n");
    }

The filter is provided by transpose (location 120), the exception is raised by fully_connect_layer (location 121):
image

Will it help if add transpose into one of these quantization configuration:

def get_quantize_config(model):
    layer_param_dict = {}  # stores {Layer_Name: QuantizeConfig} pairs
    scope = {}  # stores all custom objects

    for layer in model.layers:
    
            if layer.name.startswith(('clip', 'minimum', 'minimum_scalar', 'maximum_scalar', 'cast', 'stop_gradient')):
                layer_param_dict[layer.name] = {'quantize_config': NoOpQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__
            
            elif 'grad_subtract' in layer.name or layer.name.startswith(('mat_mul', 'multiply', 'scalar_multiply', 'add',
                                                                         'scalar_add', 'slice', 'mean', 'subtract',
                                                                         'scalar_subtract', 'r_sqrt', 'relu')):
                layer_param_dict[layer.name] = {'quantize_config': TFOpQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__
                
            elif layer.name.startswith(( 'scale', 'centre', 'positional_embedding', 'token_embedding' )):
                layer_param_dict[layer.name] = {'quantize_config': WeightQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__

            # Make sure to quantise the encoder and decoder mask input layers so that they can be quantized to INT8
            
            elif layer.name.startswith(('encoder_masks', 'decoder_masks' )):
                layer_param_dict[layer.name] = {'quantize_config': MaskOpQuantizeConfig()}
            
            elif isinstance(layer, tf.keras.layers.Dense):
                layer_param_dict[layer.name] = {'quantize_config':DenseQuantizeConfig()}

            elif 'variance' in layer.name:
                layer_param_dict[layer.name] = {'quantize_config': VarianceQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__

Hi, @focus-martin Thanks for sharing details.

It seems a transpose layer output has -1 as zero point caused this issue: image

When zero_point = -1, following code will raise exception, if comment out zero_point == 0 check, the inference will run, but prediction result is wrong.

    data->input_zero_point = input->params.zero_point;
    // Filter weights will always be symmetric quantized since we only support
    // int8 quantization. See
    // https://github.com/tensorflow/tensorflow/issues/44912 for additional
    // context.

    TFLITE_DCHECK(filter->params.zero_point == 0);
    if (filter->params.zero_point != 0){
      printf("(filter->params.zero_point != 0)\n");
    }

The filter is provided by transpose (location 120), the exception is raised by fully_connect_layer (location 121): image

Will it help if add transpose into one of these quantization configuration:

def get_quantize_config(model):
    layer_param_dict = {}  # stores {Layer_Name: QuantizeConfig} pairs
    scope = {}  # stores all custom objects

    for layer in model.layers:
    
            if layer.name.startswith(('clip', 'minimum', 'minimum_scalar', 'maximum_scalar', 'cast', 'stop_gradient')):
                layer_param_dict[layer.name] = {'quantize_config': NoOpQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__
            
            elif 'grad_subtract' in layer.name or layer.name.startswith(('mat_mul', 'multiply', 'scalar_multiply', 'add',
                                                                         'scalar_add', 'slice', 'mean', 'subtract',
                                                                         'scalar_subtract', 'r_sqrt', 'relu')):
                layer_param_dict[layer.name] = {'quantize_config': TFOpQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__
                
            elif layer.name.startswith(( 'scale', 'centre', 'positional_embedding', 'token_embedding' )):
                layer_param_dict[layer.name] = {'quantize_config': WeightQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__

            # Make sure to quantise the encoder and decoder mask input layers so that they can be quantized to INT8
            
            elif layer.name.startswith(('encoder_masks', 'decoder_masks' )):
                layer_param_dict[layer.name] = {'quantize_config': MaskOpQuantizeConfig()}
            
            elif isinstance(layer, tf.keras.layers.Dense):
                layer_param_dict[layer.name] = {'quantize_config':DenseQuantizeConfig()}

            elif 'variance' in layer.name:
                layer_param_dict[layer.name] = {'quantize_config': VarianceQuantizeConfig()}
                scope[layer.__class__.__name__] = layer.__class__

Greating,
I encounter the same problem of fully_connect_layer without zeropoint == 0.
Have you solved this problem?

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

"This issue is being closed because it has been marked as
stale for 5 days with no further activity."