floydhub/floyd-cli

libcublas Error

pgrandinetti opened this issue · 3 comments

I had to specify

tensorflow-gpu==1.5.0

within a floyd_requirements.txt file, in order to overcome this problem tensorflow/models#2653 (comment)

In fact that error is not raised anymore, but I am running into this:

2018-02-16 12:51:07,466 INFO - File "/usr/local/lib/python3.5/imp.py", line 242, in load_module
2018-02-16 12:51:07,466 INFO - return load_dynamic(name, filename, file)
2018-02-16 12:51:07,466 INFO - File "/usr/local/lib/python3.5/imp.py", line 342, in load_dynamic
2018-02-16 12:51:07,466 INFO - return _load(spec)
2018-02-16 12:51:07,467 INFO - ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
2018-02-16 12:51:07,467 INFO - 
2018-02-16 12:51:07,467 INFO - 
2018-02-16 12:51:07,467 INFO - Failed to load the native TensorFlow runtime.

It's quite known of an issue, but I can't solve it in the remote enviroment provided by floydhub. Any suggestion?

houqp commented

Hi, could you give us the full command to reproduce this error? Have you tried running your job with ---env tensorflow-1.5?

Trying --env tensorflow-1.5 results in a "environment not available" error. I am discussing this with your team in another channel.
Here's the stack trace of this error instead

2018-02-16 12:51:07,464 INFO - File "/usr/local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
2018-02-16 12:51:07,465 INFO - from tensorflow.python.pywrap_tensorflow_internal import *
2018-02-16 12:51:07,465 INFO - File "/usr/local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
2018-02-16 12:51:07,465 INFO - _pywrap_tensorflow_internal = swig_import_helper()
2018-02-16 12:51:07,465 INFO - File "/usr/local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
2018-02-16 12:51:07,466 INFO - _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
2018-02-16 12:51:07,466 INFO - File "/usr/local/lib/python3.5/imp.py", line 242, in load_module
2018-02-16 12:51:07,466 INFO - return load_dynamic(name, filename, file)
2018-02-16 12:51:07,466 INFO - File "/usr/local/lib/python3.5/imp.py", line 342, in load_dynamic
2018-02-16 12:51:07,466 INFO - return _load(spec)
2018-02-16 12:51:07,467 INFO - ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
2018-02-16 12:51:07,467 INFO -
2018-02-16 12:51:07,467 INFO -
2018-02-16 12:51:07,467 INFO - Failed to load the native TensorFlow runtime.

Your team solved the problem with --env tensorflow-1.5, therefore, even though this issue is not solved strictly speaking, I'm closing it: there's no reason to use the floyd_requirements file anymore. You might still wanna look inside your VM to understand what the problem was though...
Thanks (and good luck)!