TF-TRT generated model is still fp32 when converting using precision_mode="FP16"

Question

TF-TRT generated model is still fp32 when converting using precision_mode="FP16"

yuqcraft opened this issue 4 years ago · 2 comments

I have a tensorflow (version 1.14) float32 SavedModel that I want to convert to float16. According to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example , I could pass "FP16" to precision_mode to convert the model to fp16. But the converted model, after checking the tensorboard, is still fp32: net paramters are DT_FLOAT instead of DT_HALF. And the size of the converted model is similar to the model before conversion. (Here I assume that, if converted successfully, the model will become half as large since paramters are cut in half).

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import os

FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('saved_model_dir', '', 'Input saved model dir.')
tf.flags.DEFINE_bool('use_float16', False,
                     'Whether we want to quantize it to float16.')
tf.flags.DEFINE_string('output_dir', '', 'Output saved model dir.')


def main(argv):
    del argv  # Unused.
    saved_model_dir = FLAGS.saved_model_dir
    output_dir = FLAGS.output_dir
    use_float16 = FLAGS.use_float16

    precision_mode = "FP16" if use_float16 else "FP32"
    converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
                                      precision_mode=precision_mode)
    converter.convert()
    converter.save(output_dir)


if __name__ == '__main__':
    tf.app.run(main)

Am I understanding it wrong or? Any advices or suggestions are very welcome! Thanks

Answer 1 · 2020-03-18T17:21:47.000Z

I have answered the question on StackOverflow. We can continue discussion there, if there are any more questions.

Answer 2 · 2020-03-19T07:38:15.000Z

@tfeher Thanks Tamas! That did answer my question.

I also nvprof-ed the converted model, it seems in my volta gpu, h1688 and h844 kernels are called multiple times, which proves that fp16 as well as tensor core is used correctly.