tensorflow/tensorrt

converted model pb too large

Opened this issue · 0 comments

We tested it following the blog Leveraging TensorFlow-TensorRT integration for Low latency Inference, and got a very large saved model

ENV

  • Tensorflow: 2.4.1
  • TensorRT: 6.0.1
  • Cuda: 10.1
  • cudnn: 7.6

size before and after convert

convert a model finetuned with bert

from tensorflow.python.compiler.tensorrt import trt_convert as trt

conversion_params = trt.TrtConversionParams(precision_mode=trt.TrtPrecisionMode.FP32)

input_saved_model_dir = 'xxx'
output_saved_model_dir = 'xxx'

converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir)
4.0K	./bert_finetune_20210303/assets
9.5M	./bert_finetune_20210303/saved_model.pb
387M	./bert_finetune_20210303/variables
397M	./bert_finetune_20210303

4.0K	./bert_finetune_20210303_fp16/assets
1.1G	./bert_finetune_20210303/saved_model.pb
387M	./bert_finetune_20210303_fp16/variables
1.5G   	./bert_finetune_20210303_fp16

4.0K	./bert_finetune_20210303_fp32/assets
1.1G	./bert_finetune_20210303_fp32/saved_model.pb
387M	./bert_finetune_20210303_fp32/variables
1.5G 	./bert_finetune_20210303_fp32

Anyone could help me. Thanks a lot