How to optimize an object detection model as TF-TRT INT8 with NMS enable?

Question

How to optimize an object detection model as TF-TRT INT8 with NMS enable?

Closed this issue 4 years ago · 4 comments

vilmara commented 4 years ago

Hi, is there some sample on how to optimize an object detection model as TF-TRT INT8 with NMS enable?

Answer 1 · 2020-12-29T20:26:52.000Z

Can you be a bit more specific on what you tried to execute and what didn't work ?

And I mean didn't work: did it crash or no optimization ?

Answer 2 · 2020-12-30T16:28:20.000Z

Thanks for your reply. I mean It doesn't work since there is no optimization after converting the model to TF-TRT INT8.

I am trying to deploy the optimized model TF-TRT INT8 faster_rcnn_inception_v2_coco_2018_01_28 using DeepStream-Triton container. I am using as an example this blog https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/ 1, but the referenced script doesn’t include the option to optimize the model as TF-TRT INT8 with NMS.

On the other hand, I created the offline prebuilt TF-TRT INT8 (without nms implementation) and passed the saved model to DS-Triton (dsnvinferserver) but I am seeing performance degradation. To build INT8 model I used the below script https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py which implements the building part of it, and the docker image nvcr.io/nvidia/tensorflow:20.02-tf2-py3

Answer 3 · 2021-01-08T02:13:26.000Z

Hi, is there some sample on how to optimize an object detection model as TF-TRT INT8 with NMS enable?

Hi @DEKHTIARJonathan, I have found that the branch r1.14 (which supports TF1.14 and 1.15) has the implementation of the object_detection.py script for INT8 with NMS optimization, so I switched to this branch to optimize my model with INT8+ NMS running the script within the docker image nvcr.io/nvidia/tensorflow:20.02-tf1-py3, but now I am getting the errors I have reported in the issue #119

Answer 4 · 2021-01-08T03:06:13.000Z

Solved!

I have changed minimum_segment_size from 2 to 3 minimum_segment_size=3 at the object_detection.py script as mentioned in #45, also updated the json file as below:

{
  "model_config": {
    "model_name": "faster_rcnn_inception_v2",
    "input_dir": "/workspace/triton_blog/",
    "batch_size": 8,
    "override_nms_score_threshold": 0.3
  },
  "optimization_config": {
    "use_trt": true,
    "precision_mode": "INT8",
    "force_nms_cpu": true,    
    "calib_images_dir": "/workspace/data/coco/val2017/",
    "num_calib_images": 16,
    "calib_batch_size": 8,
    "calib_image_shape": [600, 600],
    "max_workspace_size_bytes": 17179869184
  },
  "benchmark_config": {
    "images_dir": "/workspace/data/coco/val2017/",
    "annotation_path": "/workspace/data/coco/annotations/instances_val2017.json",
    "batch_size": 8,
    "image_shape": [600, 600],
    "num_images": 4096,
    "output_path": "stats/faster_rcnn_inception_v2_tf-trt_nms_int8.json"
  },
  "assertions": [
    "statistics['map'] > (0.277 - 0.01)"
  ]

Result:

    step 100/512, iter_time(ms)=88.1563
    step 200/512, iter_time(ms)=86.5872
    step 300/512, iter_time(ms)=86.7162
    step 400/512, iter_time(ms)=86.5696
    step 500/512, iter_time(ms)=86.4376
Loading and preparing results...
DONE (t=0.29s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=10.50s).
Accumulating evaluation results...
DONE (t=1.91s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.244
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.387
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.261
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.052
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.262
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.442
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.223
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.298
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.060
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.551
{
    "avg_latency_ms": 86.46978387738218,
    "avg_throughput_fps": 92.5178674130184,
    "map": 0.2439860628633743
}