tensorflow/tensorrt

Segmentation fault when optimize model.

isra60 opened this issue · 13 comments

Currently I'm trying with ssd_mobilenet_v2_coco with an NVIDIA 1060GTX.

I have tensorflow-gpu v1.13., CUDA10. TensorRT 5.
I've downloaded the model with

config_path, checkpoint_path = download_model('ssd_mobilenet_v2_coco', output_dir='models')

I'm trying to optimize the model. with


frozen_graph = optimize_model(
    config_path=config_path, 
    checkpoint_path=checkpoint_path,
    use_trt=True,
    precision_mode='FP16'
)

But always provokes a segmentation fault.. this is the log console.

2019-03-20 11:26:09.152490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-03-20 11:26:09.235499: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-20 11:26:09.235934: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x45ea9f0 executing computations on platform CUDA. Devices: 2019-03-20 11:26:09.235950: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1 2019-03-20 11:26:09.257165: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz 2019-03-20 11:26:09.257710: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3d6e750 executing computations on platform Host. Devices: 2019-03-20 11:26:09.257725: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-20 11:26:09.257948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.56GiB 2019-03-20 11:26:09.257964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:09.334021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:09.334056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:09.334062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:09.334197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:327: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 INFO:tensorflow:depth of additional conv before box predictor: 0 WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2019-03-20 11:26:15.423701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:15.423744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:15.423750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:15.423753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:15.423886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt 2019-03-20 11:26:18.529755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:18.529812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:18.529820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:18.529825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:18.529932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from models/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:96: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph INFO:tensorflow:Froze 344 variables. INFO:tensorflow:Converted 344 variables to const ops. 2019-03-20 11:26:19.787213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:19.787255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:19.787261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:19.787264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:19.787369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /home/idiaz/.local/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:288: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to save. INFO:tensorflow:No assets to write. INFO:tensorflow:No assets to write. INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:SavedModel written to: .optimize_model_tmp_dir/saved_model/saved_model.pb INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config INFO:tensorflow:Writing pipeline config file to .optimize_model_tmp_dir/pipeline.config 2019-03-20 11:26:21.916570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:21.916607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:21.916613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:21.916617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:21.916717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Running against TensorRT version 5.0.2 INFO:tensorflow:Running against TensorRT version 5.0.2 2019-03-20 11:26:23.734739: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1 2019-03-20 11:26:23.735758: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session 2019-03-20 11:26:23.738573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-20 11:26:23.738598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-20 11:26:23.738603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-20 11:26:23.738607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-20 11:26:23.738711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5369 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-03-20 11:26:24.093794: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.117771: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219573: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.219672: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:24.435736: W tensorflow/core/framework/allocator.cc:124] Allocation of 25159680 exceeds 10% of system memory. 2019-03-20 11:26:25.790160: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 2317 ops of 33 different types in the graph that are not converted to TensorRT: Fill, Switch, TopKV2, ConcatV2, Identity, Squeeze, Const, Unpack, ResizeBilinear, Reshape, Mul, Slice, Merge, Split, NonMaxSuppressionV3, GatherV2, Range, Conv2D, Cast, Greater, Minimum, Sub, StridedSlice, NoOp, ZerosLike, Pack, Transpose, ExpandDims, Where, Exp, Placeholder, Add, Shape, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops). 2019-03-20 11:26:26.231074: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 187 2019-03-20 11:26:35.074128: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 224 nodes succeeded. 2019-03-20 11:26:35.074828: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:1021] TensorRT node BoxPredictor_1/ClassPredictor/TRTEngineOp_1 added for segment 1 consisting of 2 nodes failed: Internal: Segment has no inputs (possible constfold failure). Fallback to TF... Segmentation fault (core dumped)

Also the log with gdb.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault. 0x00007fff68d60261 in tensorflow::tensorrt::convert::GetDeviceAndAllocator(tensorflow::tensorrt::convert::ConversionParams const&, tensorflow::tensorrt::convert::EngineInfo const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so (gdb) bt #0 0x00007fff68d60261 in tensorflow::tensorrt::convert::GetDeviceAndAllocator(tensorflow::tensorrt::convert::ConversionParams const&, tensorflow::tensorrt::convert::EngineInfo const&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so #1 0x00007fff68d651aa in tensorflow::tensorrt::convert::ConvertAfterShapes(tensorflow::tensorrt::convert::ConversionParams&) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so #2 0x00007fff68d90f56 in tensorflow::tensorrt::convert::TRTOptimizationPass::Optimize(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) () from /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so #3 0x00007fffb549a8ee in tensorflow::grappler::MetaOptimizer::RunOptimizer(tensorflow::grappler::GraphOptimizer*, tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem*, tensorflow::GraphDef*, tensorflow::grappler::MetaOptimizer::GraphOptimizationResult*) () from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so #4 0x00007fffb549b552 in tensorflow::grappler::MetaOptimizer::OptimizeGraph(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) () from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so #5 0x00007fffb549c8a7 in tensorflow::grappler::MetaOptimizer::Optimize(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) () from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so #6 0x00007fffb028ab9c in TF_OptimizeGraph(GCluster, tensorflow::ConfigProto const&, tensorflow::MetaGraphDef const&, bool, std::string const&, TF_Status*) () from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so #7 0x00007fffb0293157 in _wrap_TF_OptimizeGraph () from /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so #8 0x0000000000502d6f in ?? () #9 0x0000000000506859 in _PyEval_EvalFrameDefault () #10 0x0000000000504c28 in ?? () #11 0x0000000000502540 in ?? () #12 0x0000000000502f3d in ?? () #13 0x0000000000507641 in _PyEval_EvalFrameDefault () #14 0x0000000000504c28 in ?? () #15 0x0000000000502540 in ?? () #16 0x0000000000502f3d in ?? () #17 0x0000000000507641 in _PyEval_EvalFrameDefault () #18 0x0000000000504c28 in ?? () ---Type <return> to continue, or q <return> to quit--- #19 0x0000000000502540 in ?? () #20 0x0000000000502f3d in ?? () #21 0x0000000000507641 in _PyEval_EvalFrameDefault () #22 0x0000000000504c28 in ?? () #23 0x0000000000506393 in PyEval_EvalCode () #24 0x0000000000634d52 in ?? () #25 0x00000000004a38c5 in ?? () #26 0x00000000004a5cd5 in PyRun_InteractiveLoopFlags () #27 0x00000000006387b3 in PyRun_AnyFileExFlags () #28 0x000000000063915a in Py_Main () #29 0x00000000004a6f10 in main ()

Same here. You can avoid the segfault by setting force_nms_cpu to False. It would be helpful to get some information about with which versions of tensorflow, cuda and tensorrt these examples were tested. Is tensorrt 5.x supported?

Thanks! Do you know what this setting does?.

I'm also testing with my jetson tx2 with the last Jetpack which is using also TensorRT 5 and CUDA 10 (Also tensorflow 1.13.1) And there seems that ssd_mobilenet_v2 optimize is working...

I did some more tests:

  1. Using the official nvidia tensorflow docker image 19.03-py3, I tried the object detection example with the version of this repo that is included in the docker image --> It worked fine, no errors
  2. From the same docker image (but a different container), I cloned this repo and tried the object detection example --> Segmentation Fault

1 and 2 both had the same hardware configuration, same tensorflow version, same cuda and tensorrt versions. Some change in this repo must have broken things.

mmm. interesting...
Have you tried with this Repo?
https://github.com/NVIDIA-AI-IOT/tf_trt_models

This is the commit nvidia used in the docker image: d2c28ff
I just checked it out manually, did the setup and it worked fine

@88madri no, I'm not working with a Jetson platform

minimum_segment_size=50 was changed to minimum_segment_size=2. This is causing the segmentation fault

yeah I think is this commit right?
950811e

Is this issue resolved?
Looks like 19.03 has worked for you.
Which container didn't work?

I think the new default arguments work with TF 1.13. Please let me know if they don't.

No. The new default arguments don’t work with 1.13 on 1060 gtx and also on a jetson tx2.
With minimum segment size of 50 it works

I just verified that this is working now with the most recent code in the master branch, and TF 1.14.

Here is the tail of the log:

Loading and preparing results...
DONE (t=0.22s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.83s).
Accumulating evaluation results...
DONE (t=1.49s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.248
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.274
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.172
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.569
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.222
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.277
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.031
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.197
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
{
    "avg_latency_ms": 10.634882227359757,
    "avg_throughput_fps": 94.03019033227812,
    "map": 0.24784700263781276
}
ASSERTION PASSED: statistics['map'] > (0.247 - 0.005)
PASS ssd_mobilenet_v2_coco_trt_fp16.json
DONE testing ssd_mobilenet_v2_coco_trt_fp16.json

This is the config I used:

{
  "source_model": {
    "model_name": "ssd_mobilenet_v2_coco",
    "input_dir": "/data/tensorflow/object_detection/models"
  },
  "optimization_config": {
    "use_trt": true,
    "precision_mode": "FP16",
    "override_nms_score_threshold": 0.3,
    "max_batch_size": 1
  },
  "benchmark_config": {
    "images_dir": "/data/coco/coco-2017/coco2017/val2017",
    "annotation_path": "/data/coco/coco-2017/coco2017/annotations/instances_val2017.json",
    "batch_size": 1,
    "image_shape": [640, 640],
    "num_images": 4096,
    "output_path": "stats/ssd_mobilenet_v2_coco_trt_fp16.json"
  },
  "assertions": [
    "statistics['map'] > (0.247 - 0.005)"
  ]
}

Closing. Please reopen in case the issue remains.