TensorRT ResNet50 Segfaults with Telsa T4
petertorelli opened this issue · 4 comments
User reports that MLMark abruptly segfaults when running TensorRT target on an x86 System with a Tesla T4, and not other warning messages given. See below.
-INFO- --------------------------------------------------------------------------------
-INFO- Welcome to the EEMBC MLMark(tm) Benchmark!
-INFO- --------------------------------------------------------------------------------
-INFO- MLMark Version : 1.0.0
-INFO- Python Version : 3.7
-INFO- CPU Name : GenuineIntel Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz
-INFO- Total Memory (MiB) : 127571
-INFO- # of Logical CPUs : 112
-INFO- Instruction Set : x86_64
-INFO- OS Platform : Linux-4.4.0-131-generic-x86_64-with-debian-stretch-sid
-INFO- --------------------------------------------------------------------------------
-INFO- Models in this release:
-INFO- resnet50 : ResNet-50 v1.0 [ILSVRC2012]
-INFO- mobilenet : MobileNet v1.0 [ILSVRC2012]
-INFO- ssdmobilenet : SSD-MobileNet v1.0 [COCO2017]
-INFO- --------------------------------------------------------------------------------
-INFO- Parsing config file config/trt-gpu-resnet50-fp32-throughput.json
-INFO- Task: Target 'tensorrt', Workload 'resnet50'
-INFO- batch : 1
-INFO- concurrency : 1
-INFO- hardware : gpu
-INFO- iterations : 1024
-INFO- mode : throughput
-INFO- precision : fp32
failed to parse uff model
Entered in engine building part
Segmentation fault (core dumped)
Recommend to use TF1.13.1, TRT5.1.2, CUDA10.0, and version 410 of the driver. Although issues still reported.
Deferred until TRT6 target is released in 1.0.x.
Appears related to these lines of code in the Net.py
files for each model which import the library:
resnetnet_lib=os.path.join(TRT_DIR,"cpp_environment","libs","libclass_resnet50.so")
self.lib = cdll.LoadLibrary(resnetnet_lib)
self.obj = self.lib.return_object()
Adding this line (prior to the self.lib.return_obect()
call):
self.lib.return_object.restype = ctypes.c_ulonglong
Fixes the problem on the target system. Since restype
is a pointer, this was causing truncation errors. However, casting to ulonglong
might introduce compatibility errors, need to investigate a pointer type instead that matches OS/arch.
New branch trt-restype
in progress.