floydhub/tensorflow seems to be missing stubs from LD_LIBRARY_PATH
damonmaria opened this issue · 3 comments
damonmaria commented
$ sudo docker run -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
But the following works:
$ sudo docker run --env "LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" -it floydhub/tensorflow:1.9.0-gpu.cuda9cudnn7-py3_aws.32 python -c "import tensorflow; print(tensorflow.__version__)"
1.9.0
houqp commented
Hi, right now, thanks for the report. For GPU images, you need to run it with nvidia-docker, otherwise, you will get this error.
houqp commented
We use the exact same image in production at FloydHub. If you are not able to get it working with nvidia-docker, please feel free to reopen the issue, I am happy to help dig into it.
damonmaria commented
I am using it inside AWS Batch with an AMI setup as per AWS's instructions for running NVIDIA containers. Their instructions specify testing the AMI with docker
but I'll have a go and see when they run the actual batch they use nvidia-docker
instead.
Thanks.