tensorflow/model-optimization

TFLite: Maxpool layers (ops) don't behave as intended (VGG16 example)

Closed this issue · 5 comments

Describe the bug
I'm exploring the behavior of operations of TFlite for custom hardware. I quantized a pretrained VGG16 (from model zoo) into int8. The scale and zero point of input and output tensors are equal for each maxpool op. Since quantization is a monotonically increasing function, I believe the output of maxpool op (int8) should be the 2x2 maxpool of the input (int8). But it is not so.

System information

TensorFlow version (installed from source or binary): 2.7.0 (Google Colab)
TensorFlow Model Optimization version (installed from source or binary): Google Colab
Python version: 3.7.12

Describe the expected behavior
Output tensor (int8) of maxpool op should be 2x2 maxpool of the input tensor (int8) like the following numpy function:

max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

Describe the current behavior
I'm unable to see a pattern in the maxpool output

Code to reproduce the issue

Colab: https://colab.research.google.com/drive/1410SH8uEE5IX0Iuvv27SwTtpCl2XM5T5?usp=sharing

''' Get VGG16 '''
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

model = keras.applications.VGG16(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

def representative_data_gen():
    yield [tf.random.uniform((1,224,224,3), dtype=tf.dtypes.float32)]

''' Quantize '''

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_quant = converter.convert()

tflite_model_quant_file = pathlib.Path("vgg_temp.tflite")
tflite_model_quant_file.write_bytes(tflite_model_quant)
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter.allocate_tensors()

''' Run interpreter with sample data to populate the tensors '''

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
image = list(representative_data_gen())[0][0]
input_scale, input_zero_point = input_details["quantization"]
image = image / input_scale + input_zero_point
test_image = image.numpy().astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
print(output.argmax())

''' Find tensor indices of input & output to maxpool ops '''
for i in range(23):
    print(interpreter._get_op_details(i))

''' Get tensors, compare with custom maxpooling'''

max_in = interpreter.get_tensor(35)
max_out = interpreter.get_tensor(36)
max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

print('Maxpool input shape: ', max_in.shape)
print('max_in: \n',max_in[0,:10,:10,0])
print('max_out: \n',max_out[0,:5,:5,0])
print('max_out_custom: \n',max_out_custom[0,:5,:5,0])
print('Allclose: ', np.allclose(max_out,max_out_custom))

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

Hey Abarajithan, Thanks for reporting this.

What do you mean "pattern" here?
Could you show me the example for this?

Thanks!

Thanks for responding!
From my colab (and above) code, these are the input [0,:10,:10,0] and output [0,:5,:5,0] of the maxpool op. The maxpool result I expect is max_out_custom [0,:5,:5,0], which is a 2x2 spatial maxpool as you can observe. I don't understand why the op doesn't return that.

Maxpool input shape:  (1, 224, 224, 64)
max_in: 
 [[ -38  -80 -112 -112  -43  -77  -91 -112 -119  -96]
 [ -76 -128 -128  -97  -56 -128 -128 -128 -102 -117]
 [ -89 -128 -121  -58 -101 -128 -124  -76  -67 -126]
 [-125 -128 -117  -80 -128 -128  -34  -33 -114 -128]
 [ -87 -118  -85  -64 -111  -54    6  -99 -128  -91]
 [ -59 -128  -69  -43  -94  -42  -91 -128 -128  -60]
 [-128 -128  -47 -100 -128 -115 -128 -128  -66  -66]
 [-126 -111  -20 -128 -128  -93  -92  -30  -64  -77]
 [ -82  -98  -48 -128 -122  -64  -50  -59 -124  -87]
 [ -94 -123  -90 -128 -101  -48  -39  -95 -128 -117]]
max_out: 
 [[-121 -128 -113 -115 -128]
 [-125 -128  -91 -128 -115]
 [-127 -128 -107 -123 -128]
 [-128 -128  -84 -128 -128]
 [-125 -128 -113 -117 -128]]
max_out_custom: 
 [[ -38  -97  -43  -91  -96]
 [ -89  -58 -101  -33  -67]
 [ -59  -43  -42    6  -60]
 [-111  -20  -93  -30  -64]
 [ -82  -48  -48  -39  -87]]
Allclose:  False

Hi @abarajithan, this confusion is caused by the fact that tflite reuses memory during execution. The discrepancy will go away if you add experimental_preserve_all_tensors=experimental_preserve_all_tensors to the Interpreter constructor

experimental_preserve_all_tensors=True
tflite_model_quant_file = pathlib.Path("vgg_temp.tflite")
tflite_model_quant_file.write_bytes(tflite_model_quant)
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_quant_file), 
                                  experimental_preserve_all_tensors=experimental_preserve_all_tensors)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()[0]
output_details = interpreter.get_output_details()[0]
# image = list(representative_data_gen())[0][0]

input_scale, input_zero_point = input_details["quantization"]
image = image / input_scale + input_zero_point
test_image = image.numpy().astype(input_details["dtype"])
interpreter.set_tensor(input_details["index"], test_image)
interpreter.invoke()
output = interpreter.get_tensor(output_details["index"])[0]
print(output.argmax())
for i in range(23):
    print(interpreter._get_op_details(i))
max_in = interpreter.get_tensor(35)
max_out = interpreter.get_tensor(36)
max_out_custom = max_in.reshape(1,112,2,112,2,64).max(axis=2).max(axis=3)

print('Maxpool input shape: ', max_in.shape)

print('max_in: \n',max_in[0,:10,:10,0])
print('max_out: \n',max_out[0,:5,:5,0])
print('max_out_custom: \n',max_out_custom[0,:5,:5,0])

print('Allclose: ', np.allclose(max_out,max_out_custom))

yields

Maxpool input shape:  (1, 224, 224, 64)
max_in: 
 [[ -41  -97  -59 -101 -100  -61  -77 -128 -128  -81]
 [-123 -128  -91 -128 -118  -96 -128 -128 -105  -81]
 [ -90  -79  -85 -128  -94 -105 -109  -97  -82  -52]
 [ -74 -106 -128 -128  -69  -75  -91 -128 -128  -87]
 [-105 -128  -71  -84  -81 -114 -128 -113 -105  -64]
 [ -80 -100  -97 -128 -112 -109  -56  -37  -85  -43]
 [ -74 -128 -128  -86  -44  -83  -47  -79  -99  -13]
 [ -80 -128  -76  -23  -83 -128  -96  -92 -106  -58]
 [-128 -128  -69  -93 -128 -128  -53  -67 -105  -56]
 [-128  -89  -48  -76  -67  -64  -45  -80  -74  -50]]
max_out: 
 [[-41 -59 -61 -77 -81]
 [-74 -85 -69 -91 -52]
 [-80 -71 -81 -37 -43]
 [-74 -23 -44 -47 -13]
 [-89 -48 -64 -45 -50]]
max_out_custom: 
 [[-41 -59 -61 -77 -81]
 [-74 -85 -69 -91 -52]
 [-80 -71 -81 -37 -43]
 [-74 -23 -44 -47 -13]
 [-89 -48 -64 -45 -50]]
Allclose:  True

Thanks a lot! That helped a lot. Is there documentation for the exact behavior of TFlite?
Edit: Sorry, found it

@daverim
I'm having a similar issue with the behavior of fully connected layers, even with that flag set to true. Could it be a related problem?

Colab example: https://colab.research.google.com/drive/1oD7lTTbXo434n_gZr0nN4l3sFSCvR7bA?usp=sharing

Code snippet:

'''Sanity Check'''

fc_weights_q = interpreter.get_tensor(110)
fc_weights_scale, fc_weights_zero = interpreter._get_tensor_details(110)['quantization']
fc_biases_q = interpreter.get_tensor(111)
fc_biases_scale, fc_biases_zero = interpreter._get_tensor_details(111)['quantization']

fc_weights_uq = (fc_weights_q - fc_weights_zero) * fc_weights_scale
fc_biases_uq = (fc_biases_q - fc_biases_zero) * fc_biases_scale
print('weights unquantized close: ', np.allclose(fc_weights,fc_weights_uq.T,rtol=1e-2,atol=1e-2)) # True
print('biased unquantized close: ', np.allclose(fc_biases,fc_biases_uq,rtol=1e-3,atol=1e-3)) # True


''' Test FC quantization behavior '''

fc_input_q = interpreter.get_tensor(184)
fc_input_scale, fc_input_zero = interpreter._get_tensor_details(184)['quantization']
fc_output_q = interpreter.get_tensor(185)
fc_output_scale, fc_output_zero = interpreter._get_tensor_details(185)['quantization']

fc_input_uq = (fc_input_q - fc_input_zero) * fc_input_scale
fc_output_uq = (fc_output_q - fc_output_zero) * fc_output_scale

fc_output_custom = fc_input_uq @ fc_weights_uq.T + fc_biases_uq
print(np.allclose(fc_output_uq, fc_output_custom, rtol=1e-1,atol=1e-1)) # False

print(fc_output_custom[0,:10], fc_output_uq[0,:10]) # clearly not equal

Any help is much appreciated. Thanks a lot.