Native TensorFlow is >6 times faster than TensorRT

Question

Native TensorFlow is >6 times faster than TensorRT

olesalscheider opened this issue 4 years ago · 2 comments

olesalscheider commented 4 years ago

Description

My neural network runs much faster with native TensorFlow compared to a TensorRT optimized model:

Images per second with native TF: 4.785973
Images per second with TRT: 0.712366

I get these numbers on a TITAN X, but I can observe the same effect on a Titan V.

Environment

TensorRT Version: 7.0.0
GPU Type: TITAN X (Pascal)
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 7.6.5.32
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): nightly/master (8869157e7f)
PyTorch Version (if applicable): -
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

https://ft.fzi.de/d=e4fa141418484b9e94039230cd7560de

Steps To Reproduce

Download the model and the test script from the link above, extract the archive and run test.py. Make sure that you have the libraries mentioned in the environment section.

Answer 1 · 2020-09-07T14:36:17.000Z

I observed something similar on (1) Nvidia Jetson Nano and (2) Nvidia RTX 2080 Ti with ssd_mobilenet_v2_coco_2018_03_29 model. Native Tensorflow is about 3x and 10x faster than TensorRT model on Nvidia Jetson Nano and Nvidia RTX 2080 Ti, respectively.

Steps to Reproduce

Download ssd_mobilenet_v2_coco_2018_03_29 model

$ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz 
$ tar xf ssd_mobilenet_v2_coco_2018_03_29.tar.gz

Use TensorRT to optimize the model

$ git clone https://github.com/tensorflow/tensorrt.git
$ cd tensorrt/tftrt/examples/object_detection
$ git submodule update --init
$ ./install_dependencies.sh
MODEL="ssd_mobilenet_v2_coco_2018_03_29"
$ python object_detection.py --input_saved_model_dir /coco/$MODEL/saved_model --output_saved_model_dir /coco/$MODEL/tftrt_model --data_dir /coco/val2017 --annotation_path /coco/annotations/instances_val2017.json --input_size 640 --batch_size 1 --use_trt --precision FP16 --gpu_mem_cap 8192

Clone my fork of TensorRT to get the inference script

$ git clone https://github.com/dloghin/tensorrt.git tensorrt-fork

Clone Tensorflow models repo, copy my script and run

$ git clone https://github.com/tensorflow/models.git
$ cd models/research && protoc object_detection/protos/*.proto --python_out=.
$ export PYTHONPATH=`pwd`
$ cd object_detection
$ cp ~/tensorrt-fork/tftrt/examples/object_detection/inference_object_detection.py . 
$ cp ~/tensorrt-fork/tftrt/examples/object_detection/orange-apple-banana.jpg .
$ python inference_object_detection.py /coco/ssd_mobilenet_v2_coco_2018_03_29/saved_model data/mscoco_label_map.pbtxt orange-apple-banana.jpg 
$ python inference_object_detection.py /coco/ssd_mobilenet_v2_coco_2018_03_29/tftrt_model data/mscoco_label_map.pbtxt orange-apple-banana.jpg

For the first run (native Tensorflow) on RTX 2080 Ti, you should get:

...
Inference time: 4.3128767013549805 s
...

For the second run (optimized with TensorRT) on RTX 2080 Ti, you should get:

...
Inference time: 47.05435824394226 s
...

Environment

(for RTX 2080 Ti, I am running this in a Docker container based on Nvidia's nvcr.io/nvidia/tensorrt:19.10-py3 image)

GPU: Nvidia RTX 2080 Ti
Host OS: Ubuntu 18.04.4 LTS
Docker Version: 19.03.8
Nvidia Driver: 440.100
Docker Base Image: nvcr.io/nvidia/tensorrt:19.10-py3
Cuda Version: 10.1
Python Version: 3.6.8
TensortFlow Version: 2.3.0
TensorRT Version: 6.0.1

Answer 2 · 2020-09-27T05:24:53.000Z

Hi, I am having a similar issue on GPU: Nvidia RTX 2080 Ti and Jetson Xavier NX. Any idea to fix, please? Thanks