Cudnn

Question

Cudnn

Closed this issue 4 years ago · 1 comments

chenloveheimei commented 5 years ago

Before you open an issue, please make sure you have tried the following steps:

Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
Have you ever read the document for your usage?
Check if your issue appears in HOW-TO-DEBUG or FAQ.
The form below must be filled.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):18.04
NDK version(e.g., 15c):r16b
GCC version(if compiling for host, e.g., 5.4.0):7.4.0
MACE version (Use the command: git describe --long --tags):0.13
Python version(2.7): 3.6.7
Bazel version (e.g., 0.13.0):0.13

Model deploy file (*.yml)

library_name: segment
target_abis: [armeabi-v7a, arm64-v8a]
model_graph_format: code
model_data_format: code
models:
  segment:
    platform: tensorflow
    model_file_path: /media/long/data/Long/program/morpho/ubuntu/test/realtime_segmenation/models/deeplabv3_mnv2_pascal_train_aug/frozen_inference_graph.pb
    model_sha256_checksum: b3b7c39d1010c6da1dd0e87973ca24a78fa7dc70d4136fbcf50b9585469b6d9a
    subgraphs:
      - input_tensors: ImageTensor
        output_tensors: SemanticPredictions
        input_shapes: 1,342,256,3
        output_shapes: 1,342,256
    runtime: cpu+gpu
    limit_opencl_kernel_time: 0
    nnlib_graph_mode: 0
    obfuscate: 0

Describe the problem

A clear and concise description of what the bug is.
我跑tensorflow_gpu也报这个错误,用build.sh dynamic也报这个错误.是转换需要用到显存的吗?
一跑build.sh
long@long-Ubuntu:~/soft/mace/examples/android$ nvidia-smi
Wed May 13 10:42:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.09 Driver Version: 430.09 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... Off | 00000000:01:00.0 On | N/A |
| 23% 41C P2 31W / 120W | 5938MiB / 5942MiB | 9% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1662 G /usr/lib/xorg/Xorg 40MiB |
| 0 1774 G /usr/bin/gnome-shell 49MiB |
| 0 1996 G /usr/lib/xorg/Xorg 748MiB |
| 0 2185 G /usr/bin/gnome-shell 422MiB |
| 0 2688 G ...uest-channel-token=13249897578998786952 80MiB |
| 0 3099 G ...quest-channel-token=5911261781195510527 171MiB |
| 0 3157 G /usr/lib/vmware/bin/vmware-vmx 134MiB |
| 0 4432 G ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files 38MiB |
| 0 6080 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 61MiB |
| 0 7858 C /usr/bin/python3 4177MiB |
+-----------------------------------------------------------------------------+

To Reproduce

Steps to reproduce the problem:

1. cd /path/to/mace
2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file

Error information / logs

Please include the full log and/or traceback here.

Transform model to one that can better run on device
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/long/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Run transform_graph: ['strip_unused_nodes', 'remove_nodes(op=Identity, op=CheckNumerics, op=StopGradient)', 'fold_constants(ignore_errors=true)', 'fold_batch_norms', 'fold_old_batch_norms', 'remove_control_dependencies', 'strip_unused_nodes', 'sort_by_execution_order']
output keys:  dict_keys(['SemanticPredictions'])
2020-05-13 10:36:46.063336: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-13 10:36:46.075318: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/home/long/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/long/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/long/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node MobilenetV2/Conv/Conv2D}}]]
	 [[{{node Shape_366}}]]

Additional context

Add any other context about the problem here, e.g., what you have modified about the code.

Answer 1 · 2020-05-13T11:03:14.000Z

这里是MACE调用TensorFlow的TransformGraph失败了，估计是本地环境问题，可以用以下参数单独尝试一下：

TFTransformGraphOptions = [
    'strip_unused_nodes',
    'remove_nodes(op=Identity, op=CheckNumerics, op=StopGradient)',
    'fold_constants(ignore_errors=true)',
    'fold_batch_norms',
    'fold_old_batch_norms',
    'remove_control_dependencies',
    'strip_unused_nodes',
    'sort_by_execution_order'
]