Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED and Possibly insufficient driver version: 384.81.0 Segmentation fault (core dumped)

Question

Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED and Possibly insufficient driver version: 384.81.0 Segmentation fault (core dumped)

sanersbug opened this issue 6 years ago · 1 comments

when i run the order :
python main.py --mode=train --train_data=/mnt/saners-extend/FC-DenseNet-TensorFlow/data/train --val_data=/mnt/saners-extend/FC-DenseNet-TensorFlow/data/val --layers_per_block=4,5,7,10,12,15 --batch_size=2 --epochs=10 --growth_k=16 --num_classes=2 --learning_rate=0.001

it shows that:
/home/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
First Convolution Out: (?, 256, 256, 48)
Downsample Out: (?, 128, 128, 112)
Downsample Out: (?, 64, 64, 192)
Downsample Out: (?, 32, 32, 304)
Downsample Out: (?, 16, 16, 464)
Downsample Out: (?, 8, 8, 656)
Bottleneck Block: (?, 8, 8, 240)
Upsample after concat: (?, 16, 16, 896)
Upsample after concat: (?, 32, 32, 704)
Upsample after concat: (?, 64, 64, 496)
Upsample after concat: (?, 128, 128, 352)
Upsample after concat: (?, 256, 256, 224)
Mask Prediction: (?, 256, 256, 2)
2018-10-25 15:54:14.641467: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-25 15:54:14.781306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-25 15:54:14.781714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:05:01.0
totalMemory: 11.91GiB freeMemory: 2.58GiB
2018-10-25 15:54:14.781753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-10-25 15:54:15.190540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-25 15:54:15.190615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-10-25 15:54:15.190628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-10-25 15:54:15.190885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2277 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:05:01.0, compute capability: 6.0)
2018-10-25 15:55:52.442988: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-10-25 15:55:52.443130: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 384.81.0
Segmentation fault (core dumped)

my environment is cuda9.0, cudnn7.0,tensorflow 1.10.1,anyone can give some advice ? thanks very much

Answer 1 · 2018-10-26T14:34:31.000Z

You have already a process running on the GPU:

totalMemory: 11.91GiB freeMemory: 2.58GiB

^ above shows your gpu memory is occupied by another process, please use nvidia-smi to check and terminate the process and try again.
Closing because this is unrelated to the repository.