google/automl

CUDNN_STATUS_EXECUTION_FAILED error when running with TensorRT

Closed this issue · 5 comments

System environments:

TF 2.2.0
CUDA 10.1
CUDNN 7.6.5
TensorRT 6.0.1.5
GPU RTX2080Ti

I tried this command: python model_inspect.py --runmode=bm --model_name=efficientdet-d1 --tensorrt=FP16 but got the CUDNN_STATUS_EXECUTION_FAILED error:

2020-05-22 14:40:36.657674: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   constant_folding: Graph size after: 34 nodes (0), 34 edges (0), time = 0.542ms.
2020-05-22 14:40:36.657680: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   TensorRTOptimizer: Graph size after: 34 nodes (0), 34 edges (0), time = 0.05ms.
2020-05-22 14:40:36.657686: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   constant_folding: Graph size after: 34 nodes (0), 34 edges (0), time = 0.519ms.
2020-05-22 14:40:37.368078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-22 14:40:37.368384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-05-22 14:40:37.368429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-22 14:40:37.368438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-22 14:40:37.368446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-22 14:40:37.368454: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-22 14:40:37.368468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-22 14:40:37.368477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-22 14:40:37.368486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-22 14:40:37.368535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-22 14:40:37.368765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-22 14:40:37.368958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-22 14:40:37.368984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 14:40:37.368989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-05-22 14:40:37.368993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-05-22 14:40:37.369076: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-22 14:40:37.369306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-22 14:40:37.369510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10201 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:08:00.0, compute capa
bility: 7.5)
2020-05-22 14:40:38.624109: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger ../rtSafe/safeContext.cpp (110) - cuBLAS Error in initializeCommonContext: 3 (Could not initialize cublas, please check cuda installation.)
2020-05-22 14:40:38.627326: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_STATE: std::exception
2020-05-22 14:40:38.627342: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
2020-05-22 14:40:38.629129: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger ../rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
2020-05-22 14:40:38.629152: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_STATE: std::exception
2020-05-22 14:40:38.629159: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger INVALID_CONFIG: Deserialize the cuda engine failed.
2020-05-22 14:40:38.629601: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(3158): 'cudnnConvolutionForward( cudnn.handle(), alpha, input_nd.handle(), input_data.opaque(), filter_nd.handle(), filter_data.opaque(), conv.handle(), ToConvForwardAlgo(algorithm_desc), scratch_memory.opaque(), scratch_memo
ry.size(), beta, output_nd.handle(), output_data.opaque())'
2020-05-22 14:40:38.629630: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at trt_engine_op.cc:390 : Internal: cuDNN launch failure : input shape([1,240,1,1]) filter shape([1,1,240,10])
         [[{{node efficientnet-b1/blocks_6/se/conv2d/Conv2D}}]]
Traceback (most recent call last):
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: [_Derived_]cuDNN launch failure : input shape([1,240,1,1]) filter shape([1,1,240,10])
         [[{{node efficientnet-b1/blocks_6/se/conv2d/Conv2D}}]]
         [[import/efficientnet-b1/blocks_6/se/TRTEngineOp_121]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model_inspect.py", line 499, in <module>
    app.run(main)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "model_inspect.py", line 486, in main
    trace_filename=FLAGS.trace_filename)
  File "model_inspect.py", line 454, in run_model
    trace_filename=kwargs.get('trace_filename', None))
  File "model_inspect.py", line 376, in benchmark_model
    sess.run(output_name, feed_dict={input_name: img})
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
    run_metadata_ptr)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
    feed_dict_tensor, options, run_metadata)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "../miniconda3/envs/tf2py37/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: [_Derived_]cuDNN launch failure : input shape([1,240,1,1]) filter shape([1,1,240,10])
         [[{{node efficientnet-b1/blocks_6/se/conv2d/Conv2D}}]]
         [[import/efficientnet-b1/blocks_6/se/TRTEngineOp_121]]

To solve the above, I need to add two lines in the __main__ (following tensorflow/tensorflow#33938):

if __name__ == '__main__':
  logging.set_verbosity(logging.WARNING)
  gpu = tf.config.experimental.list_physical_devices('GPU')
  tf.config.experimental.set_memory_growth(gpu[0], True)
  tf.disable_eager_execution()
  app.run(main)

The --runmode=saved_model is safe and can be run without the above changes, but --runmode=bm is not.

I meet this problem with --runmode= saved_model_benchmark with rt model exported by python model_inspect.py --runmode=saved_model --model_name=efficientdet-d0
--ckpt_path=efficientdet-d0 --saved_model_dir=savedmodeldir
--tensorrt=FP32`

I just run this command line on V100 with the latest code:

python model_inspect.py --runmode=bm --tensorrt=FP32

It reports: "Per batch inference time: 0.0074005058966577055
FPS: 135.12589733245542"

So I guess it should work well for both bm and saved_model.

Thanks a lot for this update, I'll test it later