Inference time .h5 vs .tflite

Question

Inference time .h5 vs .tflite

Closed this issue 2 years ago · 9 comments

Hi all,

I got to train a yolo3_mobilenet_lite model type which I converted from .h5 to .tflite.
I evaluated the performance of both models and obtained quite different average inference times.
The .hf5 model gets an average inference time of ~300ms while the .tflite gets ~11000ms.
I really expected the .tflite model to be faster than .h5, or at least faster than what I got.

Anyone had a similar behavior?

!python ./keras-YOLOv3-model-set/tools/evaluation/validate_yolo.py --model_path=/path/to/model.h5 --anchors_path=./keras-YOLOv3-model-set/configs/yolo3_anchors.txt --classes_path=/path/to/classes.txt --image_path=/path/to/test/images/ --output_path=/path/to/output/ --loop_count=5

Average Inference time: 304.60600853ms
PostProcess time: 5.36322594ms
Found 2 boxes for /path/to/test/images/image.png
Class: dog, Score: 0.8985981941223145, Box: (447, 216),(567, 383)
Class: dog, Score: 0.3477180004119873, Box: (555, 12),(606, 76)

!python ./keras-YOLOv3-model-set/tools/evaluation/validate_yolo.py --model_path=/path/to/model_quant.tflite --anchors_path=./keras-YOLOv3-model-set/configs/yolo3_anchors.txt --classes_path=/path/to/classes.txt --image_path=/path/to/test/images/ --output_path=/path/to/output/ --loop_count=5

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Average Inference time: 11718.34445000ms
PostProcess time: 5.61642647ms
Found 3 boxes for /path/to/test/images/image.png
Class: dog, Score: 0.9015061855316162, Box: (464, 204),(552, 397)
Class: dog, Score: 0.16690415143966675, Box: (562, 27),(587, 59)
Class: dog, Score: 0.16690415143966675, Box: (534, 10),(615, 67)

The model was converted with:
!python post_train_quant_convert.py --keras_model_file=/path/to/model.h5 --annotation_file=/path/to/annotation/train.txt --model_input_shape=416x416 --sample_num=174 --output_file=/path/to/model_quant.tflite

Answer 1 · 2022-04-07T08:43:21.000Z

@d-duran not quite sure about your inference environment, but quantized TFLITE model is not designed for x86 CPU/GPU inference, so it may have bad performance

Answer 2 · 2022-04-07T23:13:27.000Z

@david8862 is correct. Even if you build TF-lite and enable XNNPACK, the inference time will be slow for x86 architecture. I suggest you use Openvino as in this link: https://docs.openvino.ai/latest/omz_models_model_yolo_v3_tf.html.

Answer 3 · 2022-04-12T08:21:30.000Z

@david8862 @mesolmaz thanks for the response. I tried to convert the model to float32 .tflite insted (not quantized) to check the performance too but an error keeps showing:

%cd ./keras-YOLOv3-model-set/tools/model_converter/

!python custom_tflite_convert.py --keras_model_file=/path/to/model.h5 --output_file=/path/to/model_TFL.tflite

/./keras-YOLOv3-model-set/tools/model_converter
Traceback (most recent call last):
File "custom_tflite_convert.py", line 581, in
main()
File "custom_tflite_convert.py", line 577, in main
app.run(main=run_main, argv=sys.argv[:1])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "custom_tflite_convert.py", line 564, in run_main
_convert_tf2_model(tflite_flags)
File "custom_tflite_convert.py", line 278, in _convert_tf2_model
model = keras.models.load_model(flags.keras_model_file, custom_objects = custom_object_dict)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/saving/save.py", line 202, in load_model
compile)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 181, in load_model_from_hdf5
custom_objects=custom_objects)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/saving/model_config.py", line 52, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/serialization.py", line 127, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 678, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/functional.py", line 668, in from_config
config, custom_objects)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/functional.py", line 1278, in reconstruct_from_config
process_layer(layer_data)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/functional.py", line 1260, in process_layer
layer = deserialize_layer(layer_data, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/serialization.py", line 127, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 660, in deserialize_keras_object
config, module_objects, custom_objects, printable_module_name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 561, in class_and_config_for_serialized_keras_object
.format(printable_module_name, class_name))
ValueError: Unknown layer: SyncBatchNormalization. Please ensure this object is passed to the custom_objects argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

Seems to be something about the custom SyncBatchNormalization layer but not sure how to solve this.

EDIT: I switched to tensorflow version 1.x in Google Colab (v1.15.2) and the following error shows:
%tensorflow_version 1.x

./keras-YOLOv3-model-set/tools/model_converter
Traceback (most recent call last):
File "custom_tflite_convert.py", line 581, in
main()
File "custom_tflite_convert.py", line 577, in main
app.run(main=run_main, argv=sys.argv[:1])
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "custom_tflite_convert.py", line 573, in run_main
_convert_tf1_model(tflite_flags)
File "custom_tflite_convert.py", line 184, in _convert_tf1_model
converter = _get_toco_converter(flags)
File "custom_tflite_convert.py", line 171, in _get_toco_converter
return converter_fn(**converter_kwargs)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/lite/python/lite.py", line 820, in from_keras_model_file
keras_model = _keras.models.load_model(model_file, custom_objects)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/keras/saving/save.py", line 143, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/keras/saving/hdf5_format.py", line 160, in load_model_from_hdf5
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

I had h5py library downgraded to 2.10 as well (saw in other issues) but:

ValueError: Unknown layer: Functional

Is this an issue with libraries' versions of some sort? Tensorflow 2.2 worked well with model dumping, conversion to quantized tflite and evaluations...

Answer 4 · 2022-04-12T12:42:40.000Z

@d-duran as I know the tf.keras .h5 model file may have some compatibility issue between different TF version. So usually I just use same TF version for train, dump, convert and inference

Answer 5 · 2022-04-18T06:54:55.000Z

@david8862 It eventually worked out for me using TF 2.4 version for all steps. Coming back to the original issue, the inference time for TF and TFLite are now similar (about 200ms for TFL and 300ms for TF). However I expected a reduction on the model size after conversion but it didn't happen... could you recommend a model type (of those available) to minimize model size?

Answer 6 · 2022-04-19T14:24:34.000Z

@david8862 It eventually worked out for me using TF 2.4 version for all steps. Coming back to the original issue, the inference time for TF and TFLite are now similar (about 200ms for TFL and 300ms for TF). However I expected a reduction on the model size after conversion but it didn't happen... could you recommend a model type (of those available) to minimize model size?

@d-duran as I know quantization should be the mose common way for model file compression, in spite of the inference speed may be impact. Not sure if anything changed for TFLITE quantization in new version

Answer 7 · 2022-04-20T08:29:46.000Z

@david8862 quantized TFLITE model did indeed reduced its size compared to TF and TFLITE by about an order of magnitude.

TF and TFLITE (non-quantized) models however seem to be equally big compared to each other as conversion didn't reduce the size. I was wondering if, considering float32 TFLITE model usage only, you could recommend any particular yolo architecture which tends to be lighter. I tried yolo3_mobilenet_lite and tiny_yolo3_mobilenet and they're about 30-40Mb, perhaps there's room for improvement there?

Answer 8 · 2022-04-21T06:54:24.000Z

@d-duran maybe you can try tiny_yolo3_mobilenetv3small_lite with about 6.5 MB, if the mAP performance could match your requirement.

Answer 9 · 2022-04-27T06:53:01.000Z

@david8862 That worked. Thanks for the help and the repo!