Deploying Deep Learning Model Image Classification TensorFlow1 to TensorRT on Jetson Nano

This repository was from NVIDIA's tf_trt_models repository. About repository uses NVIDIA TensorRT for deploying neural networks onto the embedded Jetson Nano platform, improving performance by optimizations from frozen graph model tensorflow1, kernel fusion, and FP32/FP16 precision. Jetson Nano doesn't support with INT8 precision, so we will guide you inference and realtime with CPU, GPU, FP32 and FP16 then make some compare between different inference. The models are sourced from the TensorFlow models repository

Setup
Convert checkpoint to frozen graph model
Convert frozen graph to tensorrt engine
Inference with cpu and gpu
Inference with tensorrt
Result Inference Model
Fine tuning flowers dataset and deploy
Prepare custom dataset
Fine tuning on google colab

Setup

First time Jetson Nano once your device has been flashed with JetPack or setup with the pre-populated SD Card Image. Jetpack that I test with is Jetpack 4.5.1 include: CUDA=10.2, OPENCV=4.1.1, TensorRT=7.1, cuDNN 8.0 was pre-install on this SD Card Image.

Clone this repository

cd ~
git clone https://github.com/SokPhanith/tf1_tensorrt_image_classification_jetson_nano.git
cd tf1_tensorrt_image_classification_jetson_nano

CUDA toolkit related paths are not set in the environment variables fix that by run :
```
bash scripts/install_basics.sh
```
Install Tensorflow1 on Jetson Nano
```
bash scripts/install_tensorflow.sh
```
Install pycuda on Jetson Nano
```
bash scripts/install_pycuda.sh
```
Install slim model repository on Jetson Nano
```
bash scripts/install_tf1_slim.sh
```
Clone tensorflow1 repository using python API tool frozen graph convert
```
bash scripts/install_tf1_tool.sh
```
Fix some issue for python file at path TF-models/research/slim/datasets/imagenet.py
```
bash scripts/fix_tf1_slim.sh
```

Add more swap memory on jetson nano by default 2GB

git clone https://github.com/JetsonHacksNano/installSwapfile
cd installSwapfile
./installSwapfile
sudo reboot

Convert checkpoint to frozen graph model

On my repository in folder download have a scripts to get checkpoint from TensorFlow models repository slim model. Let's get start example with inceptionv1. Download checkpoint.

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
bash download/inceptionv1.sh

Exporting the inference graph out a GraphDef containing the architecture of the model.

python3 TF-models/research/slim/export_inference_graph.py \
  	--alsologtostderr \
  	--model_name=inception_v1 \
	--output_file=inceptionv1/inception_v1_inf_graph.pb

Note: If you exporting with resnetv1 or vgg-net you must set labels_offset=1 because that model convert from original caffe model to tensorflow1 model like:

python3 TF-models/research/slim/export_inference_graph.py \
  	--alsologtostderr \
  	--model_name=resnet_v1_50 \
  	--output_file=resnet_v1_50/resnet_v1_50_inf_graph.pb \
	--labels_offset=1

Convert to frozen graph model with python API tool tensorflow1

python3 tensorflow/tensorflow/python/tools/freeze_graph.py \
  	--input_graph=inceptionv1/inception_v1_inf_graph.pb  \
  	--input_checkpoint=inceptionv1/inception_v1.ckpt  \
  	--input_binary=true \
	--output_graph=inceptionv1/inception_v1.pb \
  	--output_node_names=InceptionV1/Logits/Predictions/Reshape_1

You will see frozen graph model of inceptionv1 in folder inceptionv1. If you want getting start with another model that support with slim model, you can download by using scripts on downlaod folders directory, we have a summary like table below. Everytime Before run python3 tensorflow/tensorflow/python/tools/freeze_graph.py, you must know output_node_names or output_name layer of your frozen graph. you can check input_name and output_node_names(output_name) by python file layers_name.py or table below.

python3 layers_name.py inceptionv1/inception_v1_inf_graph.pb

Table

model_name	input size	input_name	output_node_names(output_name)
inception_v1	224x224	input	InceptionV1/Logits/Predictions/Reshape_1
inception_v2	224x224	input	InceptionV2/Predictions/Reshape_1
inception_v3	299x299	input	InceptionV3/Predictions/Reshape_1
inception_v4	299x299	input	InceptionV4/Logits/Predictions
inception_resnet_v2	299x299	input	InceptionResnetV2/Logits/Predictions
resnet_v1_50	224x224	input	resnet_v1_50/predictions/Reshape_1
resnet_v1_101	224x224	input	resnet_v1_101/predictions/Reshape_1
resnet_v1_152	224x224	input	resnet_v1_152/predictions/Reshape_1
resnet_v2_50	224x224	input	resnet_v2_50/predictions/Reshape_1
resnet_v2_101	224x224	input	resnet_v2_101/predictions/Reshape_1
resnet_v2_152	224x224	input	resnet_v2_152/predictions/Reshape_1
mobilenet_v1	224x224	input	MobilenetV1/Predictions/Reshape_1
mobilenet_v2	224x224	input	MobilenetV2/Predictions/Reshape_1
vgg_16	224x224	input	vgg_16/fc8/squeezed
vgg_19	224x224	input	vgg_19/fc8/squeezed
nasnet_mobile	224x224	input	final_layer/predictions
pnasnet_mobile	224x224	input	final_layer/predictions
nasnet_large	331x331	input	final_layer/predictions
pnasnet_large	331x331	input	final_layer/predictions

Out of Memory

For vgg16, vgg19, resnet_v2_152, resnet_v1_152, nasnet_large, pnasnet_large and inception_resnet_v2 is a big model,alot of layers convert to frozen graph on jetson nano not enough memory, so I have a notebook on folder colab name: "checkpoint_tf1_convert_freeze_graph.ipynb", That notebook guide you how to convert frozen graph model on google colab Free GPU just follow step by step you will get vgg16 or vgg19 frozen graph model. Click it by Getting Start or go to google colab upload my notebook and then go forward.

Convert frozen graph to tensorrt engine

After you have a frozen graph model before convert to tensorrt engine, you must convert to the Universal Framework Format (UFF) with tensorflow1. By the way when you install tensorflow1 on your jetson nano python API tool was has tool. you can find it on this path.

python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py --help

In my repository also have a convert_to_uff.py, I just copy from that path. you can convert to uff format by run.

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
python3 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py inceptionv1/inception_v1.pb

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
python3 convert_to_uff.py inceptionv1/inception_v1.pb

you will see inception_v1.uff in the same root directory with inception_v1.pb frozen graph. after that you can build tensorrt engine.

For FP16 precision

python3 tensorrt_convert.py --model=inceptionv1/inception_v1.uff  \
		--output=inceptionv1/inception_v1_fp16.engine  \
		--input_name=input \
		--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
		--fp16 \
		--height=224 \
		--width=224 \
		--batch_size=1

For FP32 precision

python3 tensorrt_convert.py --model=inceptionv1/inception_v1.uff  \
		--output=inceptionv1/inception_v1_fp32.engine  \
		--input_name=input \
		--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
		--height=224 \
		--width=224 \
		--batch_size=1

Inference with cpu and gpu

After convert frozen graph, before inference I recommend close someting like WebBrowser and another software to save your memory next just open your Ternimal you can inference with simple gpu or cpu by python like below :

maximum performance by running these commands

sudo nvpmodel -m 0
sudo jetson_clocks

For GPU with image

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
python3 runtime_simple.py --model=inceptionv1/inception_v1.pb  \
	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg

For cpu with image

python3 runtime_simple.py --model=inceptionv1/inception_v1.pb  \
	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg \
	--cpu

CSI Raspberrypi camera v2

python3 runtime_simple.py --model=inceptionv1/inception_v1.pb  \
	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg \
	--csi

Webcam

python3 runtime_simple.py --model=inceptionv1/inception_v1.pb  \
	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg \
	--webcam 0

Video

python3 runtime_simple.py --model=inceptionv1/inception_v1.pb  \
	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg \
	--video data/video/jellyfish.mkv

Note: If you exporting with resnetv1 or vgg-net for inference, you must set --mean for inference too. because caffe model was training different preprocessing with tensorflow1:

Inference with tensorrt

Before you run inference with tensorrt you must have tensorrt engine file. Before inference I recommend close someting like WebBrowser and another software to save your memory next just open your Ternimal you can inference with tensorrt engine optimizations like below :

maximum performance by running these commands

sudo nvpmodel -m 0
sudo jetson_clocks

For FP16 precision with image

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
python3 runtime_trt.py --model=inceptionv1/inception_v1_fp16.engine  \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg

For FP32 precision with image

python3 runtime_trt.py --model=inceptionv1/inception_v1_fp32.engine  \
	--label data/labels_1001.txt \
	--image=data/imagenet/bird.jpg

CSI Raspberrypi camera v2

python3 runtime_trt.py --model=inceptionv1/inception_v1_fp16.engine  \
	--label data/labels_1001.txt \
	--csi

Webcam

python3 runtime_trt.py --model=inceptionv1/inception_v1_fp16.engine  \
	--label data/labels_1001.txt \
	--webcam 0

Video

python3 runtime_trt.py --model=inceptionv1/inception_v1_fp16.engine  \
	--label data/labels_1001.txt \
	--video data/video/jellyfish.mkv

Note: If you exporting with resnetv1 or vgg-net for inference, you must set --mean for inference too. Because caffe model was training different preprocessing with tensorflow1

Result Inference Model

Inference with a simple image cat.jpg 900x675 resolution

MODEL	FP16	FP32	GPU	CPU
inception_v1	82.1FPS	48.1FPS	20.5FPS	5.5FPS
inception_v2	61.0FPS	37.9FPS	14.2FPS	4.1FPS
inception_v3	22.2FPS	12.0FPS	5.3FPS	1.6FPS
inception_v4	10.8FPS	5.7FPS	2.9FPS	0.8FPS
inception_resnetv2	9.8FPS	5.1FPS	2.4FPS	0.7FPS
resnetv1_50	37.8FPS	21.1FPS	6.1FPS	2.2FPS
resnetv1_101	20.9FPS	11.6FPS	3.5FPS	1.1FPS
resnetv1_152	14.6FPS	7.9FPS	2.6FPS	0.8FPS
resnetv2_50	33.1FPS	19.8FPS	5.5FPS	2.1FPS
resnetv2_101	18.4FPS	10.9FPS	3.4FPS	0.9FPS
resnetv2_152	12.6FPS	7.4FPS	2.4FPS	0.7FPS
mobilenetv1_224x224	79.0FPS	69.1FPS	25.3FPS	9.6FPS
mobilenetv1_160x160	266.0FPS	256.0FPS	64.0FPS	35.0FPS
mobilenetv1_128x128	430.0FPS	301.0FPS	66.0FPS	61.0FPS
mobilenetv2_1.0	72.6FPS	67.8FPS	23.1FPS	11.5FPS
mobilenetv2_1.4	48.3FPS	44.3FPS	13.3FPS	7.6FPS
vgg_16	12.6FPS	6.8FPS	-	0.5FPS
vgg_19	10.0FPS	5.8FPS	-	0.4FPS
nasnet_mobile	-	-	7.2FPS	6.0FPS
pnasnet_mobile	-	-	9.4FPS	5.4FPS
nasnet_large	-	-	-	-
pnasnet_large	-	-	-	-

Note: nasnet and pnasnet have a bit issue for convert to tensorrt engine, I will try fix it out and update soon. vgg_16 and vgg_19 inference with gpu simple jetson nano not enough memory issue. If you inference with csi raspberrypi camera v2, webcam or video, you will drop aliite bit FPS and also If you don't run command maximum performance.

Fine tuning flowers dataset and deploy

This part we will guide you to Fine Tuning flowers dataset from tensorflow with inceptionv1 Jetson Nano and then deploying model to tensorrt engine. we'll need to download the raw data and convert it to TensorFlow's native TFRecord format because slim model support TFRecord dataset for training.

Download dataset and convert to TFRecord

cd ~
cd tf1_tensorrt_image_classification_jetson_nano
python3 TF-models/research/slim/download_and_convert_data.py --dataset_name=flowers --dataset_dir=flowers

Fine tuning with inceptionv1

python3 TF-models/research/slim/train_image_classifier.py --train_dir flowers_model \
	--dataset_name flowers \
	--dataset_split_name train \
	--dataset_dir flowers \
	--model_name inception_v1 \
	--checkpoint_path inceptionv1/inception_v1.ckpt \
	--checkpoint_exclude_scopes=InceptionV1/Logits \
	--trainable_scopes=InceptionV1/Logits \
	--max_number_of_steps=4000 \
	--batch_size 4 \
	--save_interval_secs=60 \
	--save_summaries_secs=60 \
	--log_every_n_steps=100 \
	--learning_rate=0.001

validation checkpoint

python3 eval_image_classifier.py --model_name inception_v1 \
	--eval_dir flowers_model \
	--dataset_split_name validation \
	--dataset_name flowers \
	--dataset_dir flowers/ \
	--checkpoint_path flowers_model/ \
	--batch_size 4

You can look for your result training and validation by tensorboard :

tensorboard --logdir flowers_model/

Deploying to tensorrt engine with FP16 precision Exporting the inference graph out a GraphDef containing the architecture of the model.

python3 TF-models/research/slim/export_inference_graph.py \
	--alsologtostderr \
	--dataset_dir flowers \
	--dataset_name flowers \
  	--model_name=inception_v1 \
  	--output_file=flowers_model/inception_v1_inf_graph.pb

Convert to frozen graph model with python API tool tensorflow1

python3 tensorflow/tensorflow/python/tools/freeze_graph.py \
  	--input_graph=flowers_model/inception_v1_inf_graph.pb  \
  	--input_checkpoint=flowers_model/model.ckpt-4000  \
  	--input_binary=true \
	--output_graph=flowers_model/inception_v1.pb \
  	--output_node_names=InceptionV1/Logits/Predictions/Reshape_1

Build tensorrt engine

python3 convert_to_uff.py flowers_model/inception_v1.pb
python3 tensorrt_convert.py --model=flowers_model/inception_v1.uff  \
	--output=flowers_model/inception_v1_fp16.engine  \
  	--input_name=input \
	--output_name=InceptionV1/Logits/Predictions/Reshape_1 \
  	--fp16

Inference

python3 runtime_trt.py --model flowers_model/inception_v1_fp16.engine \
	--label flowers/labels.txt \
	--image data/flower/tulips1.jpg

Prepare custom dataset

You can prepare your own custom dataset by my python file camera_tool.py.This tool will take a image from webcam or CSI raspberrypi camera v2. Fisrt you must have a classes.txt file that show you one by one of your single class :

classes.txt look like : 

	arduino
	cnc
	esp8266
	pyboard

After then you can start collent image by running : csi camera

python3 camera_tool.py --dataset_dir board --label classes.txt --csi

Webcam

python3 camera_tool.py --dataset_dir board --label classes.txt --webcam 0

*press s for save *press c for change class *press q for exit

Convert to TFRecord format by using python tfrecord_convert.py

python3 tfrecord_convert.py --dataset_dir board --num_test 100

Finally, you will see tfrecord format train and valdation in board folders. you can fine tuning on it like Fine tuning flowers dataset and deploy

Fine tuning on google colab

If you want Fine tuning with big model like : vgg16, vgg19, resnet_v2_152, resnet_v1_152, nasnet_large, pnasnet_large and inception_resnet_v2, jetson nano cant not Fine tuning because out of memory but you can fine tuning on google colab FREE GPU. Dataset you can fine tuning on google colab with my notebook in folder colab name : training_slim_image_classification.ipynb.

First you must prepare custom dataset like above Prepare custom dataset step and convert to TFRecord too. example look like :

board --> folder
	arduino --> image arduino
	cnc --> image cnc
	esp8266 --> image esp7266
	pyboard --> image pyboard
	labels.txt -->	0:arduino
			1:cnc
			2:esp8266
			3:pyboard
	flowers_train_00000-of-00004.tfrecord
	flowers_train_00001-of-00004.tfrecord
	flowers_train_00002-of-00004.tfrecord
	flowers_train_00003-of-00004.tfrecord
	flowers_validation_00000-of-00004.tfrecord
	flowers_validation_00001-of-00004.tfrecord
	flowers_validation_00002-of-00004.tfrecord
	flowers_validation_00003-of-00004.tfrecord

rename folder to dataset like :

dataset --> folder
	arduino --> image arduino
	cnc --> image cnc
	esp8266 --> image esp7266
	pyboard --> image pyboard
	labels.txt -->	0:arduino
			1:cnc
			2:esp8266
			3:pyboard
	flowers_train_00000-of-00004.tfrecord
	flowers_train_00001-of-00004.tfrecord
	flowers_train_00002-of-00004.tfrecord
	flowers_train_00003-of-00004.tfrecord
	flowers_validation_00000-of-00004.tfrecord
	flowers_validation_00001-of-00004.tfrecord
	flowers_validation_00002-of-00004.tfrecord
	flowers_validation_00003-of-00004.tfrecord

compress folder dataset to dataset.zip than upload to your google drive Click Getting Start or upload my notebook to google colab and then go forward.

Example image fine tuning vgg_19 on google colab and inference

SokPhanith/tf1_tensorrt_image_classification_jetson_nano

Deploying Deep Learning Model Image Classification TensorFlow1 to TensorRT on Jetson Nano

Table of Contents

Setup

Convert checkpoint to frozen graph model

Table

Out of Memory

Convert frozen graph to tensorrt engine

Inference with cpu and gpu

Inference with tensorrt

Result Inference Model

Fine tuning flowers dataset and deploy

Prepare custom dataset

Fine tuning on google colab

Example image fine tuning vgg_19 on google colab and inference

References