TF-TRT generated model is still fp32 when converting using precision_mode="FP16"
yuqcraft opened this issue · 2 comments
I have a tensorflow (version 1.14) float32 SavedModel that I want to convert to float16. According to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example , I could pass "FP16" to precision_mode to convert the model to fp16. But the converted model, after checking the tensorboard, is still fp32: net paramters are DT_FLOAT instead of DT_HALF. And the size of the converted model is similar to the model before conversion. (Here I assume that, if converted successfully, the model will become half as large since paramters are cut in half).
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import os
FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('saved_model_dir', '', 'Input saved model dir.')
tf.flags.DEFINE_bool('use_float16', False,
'Whether we want to quantize it to float16.')
tf.flags.DEFINE_string('output_dir', '', 'Output saved model dir.')
def main(argv):
del argv # Unused.
saved_model_dir = FLAGS.saved_model_dir
output_dir = FLAGS.output_dir
use_float16 = FLAGS.use_float16
precision_mode = "FP16" if use_float16 else "FP32"
converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
precision_mode=precision_mode)
converter.convert()
converter.save(output_dir)
if __name__ == '__main__':
tf.app.run(main)
Am I understanding it wrong or? Any advices or suggestions are very welcome! Thanks
I have answered the question on StackOverflow. We can continue discussion there, if there are any more questions.