TFLite: Maxpool layers (ops) don't behave as intended (VGG16 example)

Question

TFLite: Maxpool layers (ops) don't behave as intended (VGG16 example)

Closed this issue 3 years ago · 5 comments

Describe the bug
I'm exploring the behavior of operations of TFlite for custom hardware. I quantized a pretrained VGG16 (from model zoo) into int8. The scale and zero point of input and output tensors are equal for each maxpool op. Since quantization is a monotonically increasing function, I believe the output of maxpool op (int8) should be the 2x2 maxpool of the input (int8). But it is not so.

System information

TensorFlow version (installed from source or binary): 2.7.0 (Google Colab)
TensorFlow Model Optimization version (installed from source or binary): Google Colab
Python version: 3.7.12

Describe the expected behavior
Output tensor (int8) of maxpool op should be 2x2 maxpool of the input tensor (int8) like the following numpy function:

max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

Describe the current behavior
I'm unable to see a pattern in the maxpool output

Code to reproduce the issue

Colab: https://colab.research.google.com/drive/1410SH8uEE5IX0Iuvv27SwTtpCl2XM5T5?usp=sharing

''' Get VGG16 '''
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

model = keras.applications.VGG16(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

def representative_data_gen():
    yield [tf.random.uniform((1,224,224,3), dtype=tf.dtypes.float32)]

''' Quantize '''

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_quant = converter.convert()

tflite_model_quant_file = pathlib.Path("vgg_temp.tflite")
tflite_model_quant_file.write_bytes(tflite_model_quant)
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter.allocate_tensors()

''' Run interpreter with sample data to populate the tensors '''

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
image = list(representative_data_gen())[0][0]
input_scale, input_zero_point = input_details["quantization"]
image = image / input_scale + input_zero_point
test_image = image.numpy().astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
print(output.argmax())

''' Find tensor indices of input & output to maxpool ops '''
for i in range(23):
    print(interpreter._get_op_details(i))

''' Get tensors, compare with custom maxpooling'''

max_in = interpreter.get_tensor(35)
max_out = interpreter.get_tensor(36)
max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

print('Maxpool input shape: ', max_in.shape)
print('max_in: \n',max_in[0,:10,:10,0])
print('max_out: \n',max_out[0,:5,:5,0])
print('max_out_custom: \n',max_out_custom[0,:5,:5,0])
print('Allclose: ', np.allclose(max_out,max_out_custom))

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

Answer 1 · 2022-01-10T12:38:37.000Z

Hey Abarajithan, Thanks for reporting this.

What do you mean "pattern" here?
Could you show me the example for this?

Thanks!

Answer 2 · 2022-01-11T04:09:50.000Z

Thanks for responding!
From my colab (and above) code, these are the input [0,:10,:10,0] and output [0,:5,:5,0] of the maxpool op. The maxpool result I expect is max_out_custom [0,:5,:5,0], which is a 2x2 spatial maxpool as you can observe. I don't understand why the op doesn't return that.

Maxpool input shape:  (1, 224, 224, 64)
max_in: 
 [[ -38  -80 -112 -112  -43  -77  -91 -112 -119  -96]
 [ -76 -128 -128  -97  -56 -128 -128 -128 -102 -117]
 [ -89 -128 -121  -58 -101 -128 -124  -76  -67 -126]
 [-125 -128 -117  -80 -128 -128  -34  -33 -114 -128]
 [ -87 -118  -85  -64 -111  -54    6  -99 -128  -91]
 [ -59 -128  -69  -43  -94  -42  -91 -128 -128  -60]
 [-128 -128  -47 -100 -128 -115 -128 -128  -66  -66]
 [-126 -111  -20 -128 -128  -93  -92  -30  -64  -77]
 [ -82  -98  -48 -128 -122  -64  -50  -59 -124  -87]
 [ -94 -123  -90 -128 -101  -48  -39  -95 -128 -117]]
max_out: 
 [[-121 -128 -113 -115 -128]
 [-125 -128  -91 -128 -115]
 [-127 -128 -107 -123 -128]
 [-128 -128  -84 -128 -128]
 [-125 -128 -113 -117 -128]]
max_out_custom: 
 [[ -38  -97  -43  -91  -96]
 [ -89  -58 -101  -33  -67]
 [ -59  -43  -42    6  -60]
 [-111  -20  -93  -30  -64]
 [ -82  -48  -48  -39  -87]]
Allclose:  False

Answer 3 · 2022-01-24T01:58:04.000Z

Hi @abarajithan, this confusion is caused by the fact that tflite reuses memory during execution. The discrepancy will go away if you add experimental_preserve_all_tensors=experimental_preserve_all_tensors to the Interpreter constructor

experimental_preserve_all_tensors=True
tflite_model_quant_file = pathlib.Path("vgg_temp.tflite")
tflite_model_quant_file.write_bytes(tflite_model_quant)
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_quant_file), 
                                  experimental_preserve_all_tensors=experimental_preserve_all_tensors)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
# image = list(representative_data_gen())[0][0]

input_scale, input_zero_point = input_details["quantization"]
image = image / input_scale + input_zero_point
test_image = image.numpy().astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
print(output.argmax())
for i in range(23):
    print(interpreter._get_op_details(i))
max_in = interpreter.get_tensor(35)
max_out = interpreter.get_tensor(36)
max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

print('Maxpool input shape: ', max_in.shape)

print('max_in: \n',max_in[0,:10,:10,0])
print('max_out: \n',max_out[0,:5,:5,0])
print('max_out_custom: \n',max_out_custom[0,:5,:5,0])

print('Allclose: ', np.allclose(max_out,max_out_custom))

yields

Maxpool input shape:  (1, 224, 224, 64)
max_in: 
 [[ -41  -97  -59 -101 -100  -61  -77 -128 -128  -81]
 [-123 -128  -91 -128 -118  -96 -128 -128 -105  -81]
 [ -90  -79  -85 -128  -94 -105 -109  -97  -82  -52]
 [ -74 -106 -128 -128  -69  -75  -91 -128 -128  -87]
 [-105 -128  -71  -84  -81 -114 -128 -113 -105  -64]
 [ -80 -100  -97 -128 -112 -109  -56  -37  -85  -43]
 [ -74 -128 -128  -86  -44  -83  -47  -79  -99  -13]
 [ -80 -128  -76  -23  -83 -128  -96  -92 -106  -58]
 [-128 -128  -69  -93 -128 -128  -53  -67 -105  -56]
 [-128  -89  -48  -76  -67  -64  -45  -80  -74  -50]]
max_out: 
 [[-41 -59 -61 -77 -81]
 [-74 -85 -69 -91 -52]
 [-80 -71 -81 -37 -43]
 [-74 -23 -44 -47 -13]
 [-89 -48 -64 -45 -50]]
max_out_custom: 
 [[-41 -59 -61 -77 -81]
 [-74 -85 -69 -91 -52]
 [-80 -71 -81 -37 -43]
 [-74 -23 -44 -47 -13]
 [-89 -48 -64 -45 -50]]
Allclose:  True

Answer 4 · 2022-01-24T10:20:04.000Z

Thanks a lot! That helped a lot. ~~Is there documentation for the exact behavior of TFlite?~~
Edit: Sorry, found it

Answer 5 · 2022-01-26T16:04:42.000Z

@daverim
I'm having a similar issue with the behavior of fully connected layers, even with that flag set to true. Could it be a related problem?

Colab example: https://colab.research.google.com/drive/1oD7lTTbXo434n_gZr0nN4l3sFSCvR7bA?usp=sharing

Code snippet:

'''Sanity Check'''

fc_weights_q = interpreter.get_tensor(110)
fc_weights_scale, fc_weights_zero = interpreter._get_tensor_details(110)['quantization']
fc_biases_q = interpreter.get_tensor(111)
fc_biases_scale, fc_biases_zero = interpreter._get_tensor_details(111)['quantization']

fc_weights_uq = (fc_weights_q - fc_weights_zero) * fc_weights_scale
fc_biases_uq = (fc_biases_q - fc_biases_zero) * fc_biases_scale
print('weights unquantized close: ', np.allclose(fc_weights,fc_weights_uq.T,rtol=1e-2,atol=1e-2)) # True
print('biased unquantized close: ', np.allclose(fc_biases,fc_biases_uq,rtol=1e-3,atol=1e-3)) # True


''' Test FC quantization behavior '''

fc_input_q = interpreter.get_tensor(184)
fc_input_scale, fc_input_zero = interpreter._get_tensor_details(184)['quantization']
fc_output_q = interpreter.get_tensor(185)
fc_output_scale, fc_output_zero = interpreter._get_tensor_details(185)['quantization']

fc_input_uq = (fc_input_q - fc_input_zero) * fc_input_scale
fc_output_uq = (fc_output_q - fc_output_zero) * fc_output_scale

fc_output_custom = fc_input_uq @ fc_weights_uq.T + fc_biases_uq
print(np.allclose(fc_output_uq, fc_output_custom, rtol=1e-1,atol=1e-1)) # False

print(fc_output_custom[0,:10], fc_output_uq[0,:10]) # clearly not equal

Any help is much appreciated. Thanks a lot.