uber/neuropod

Error from dlopen: libneuropod_tensorflow_backend.so: cannot open shared object file: No such file or directory

yapus opened this issue · 5 comments

yapus commented

Bug

I'm trying to use Python Guide to create a neuropod model from my pre-created TensorFlow inference graph using

from neuropod.packagers import create_tensorflow_neuropod

method. The neuropod model file was written successfully without any errors. Now when i try to use that neuropod model from Python - i get the following error:

06/09/20 15:55:28.213452: E neuropod/internal/backend_registration.cc:125] [thread 22488, process 22488] Loading the default backend for type 'tensorflow' failed. Error from dlopen: libneuropod_tensorflow_backend.so: cannot open shared object file: No such file or directory
06/09/20 15:55:28.213700: E neuropod/multiprocess/multiprocess.cc:128] [thread 22479, process 22479] Got an exception when loading the model at /home/app/model_training/neuropod_model: Neuropod Error: Loading the default backend for type 'tensorflow' failed. Error from dlopen: libneuropod_tensorflow_backend.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "neuropod_test.py", line 7, in <module>
    with load_neuropod(PATH_TO_MODEL, visible_gpu=None) as neuropod:
  File "/home/app/miniconda3/envs/tf1x/lib/python3.7/site-packages/neuropod/loader.py", line 211, in load_neuropod
    return NativeNeuropodExecutor(neuropod_path, **kwargs)
  File "/home/app/miniconda3/envs/tf1x/lib/python3.7/site-packages/neuropod/loader.py", line 117, in __init__
    neuropod_path, _REGISTERED_BACKENDS, use_ope=True, **kwargs
RuntimeError: Neuropod Error: Got an exception when loading the model at /home/app/model_training/neuropod_model: Neuropod Error: Loading the default backend for type 'tensorflow' failed. Error from dlopen: libneuropod_tensorflow_backend.so: cannot open shared object file: No such file or directory

I did run find over my / and that file libneuropod_tensorflow_backend.so is nowhere to be found

To Reproduce

Steps to reproduce the behavior:

Sorry, i can't share the model itself.

  1. I'm using the most basic script to load the model from examples:
import os
import os.path
from neuropod.loader import load_neuropod
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
PATH_TO_MODEL = os.environ.get('NEUROPOD_MODEL') or os.path.realpath('./neuropod_model')

with load_neuropod(PATH_TO_MODEL, visible_gpu=None) as neuropod:
    # This is a list of dicts containing the "name", "dtype", and "shape"
    # of the input
    print(neuropod.inputs, neuropod.outputs)
    # Do something here
    pass

Expected behavior

I expect program exit normaly

Environment

  • Neuropod Version: 0.2.0
  • OS: Linux Ubuntu 18.04 x86_64
  • Language: Python
  • Python version: 3.7.7
  • Using OPE: no (not sure what it is...)

If this bug report is about running a specific model:

  • Neuropod backend: TensorFlow
  • Framework version: 1.14.0

I'm running on CPU

I see "multiprocess" reported error, so OPE (Out Of Process execution is used, this is set by RuntimeOptions.use_ope).

I actually can see use_ope=True

  File "/home/app/miniconda3/envs/tf1x/lib/python3.7/site-packages/neuropod/loader.py", line 117, in __init__
    neuropod_path, _REGISTERED_BACKENDS, use_ope=True, **kwargs

In this case model is executed by "worker".

I can advise 2 options:

  1. Check for neuropod_multiprocess_worker binary file (it should be in "bin" directory) and add it to your path. For example, how I used it recently:
    PATH=$PATH:bin/ DYLD_LIBRARY_PATH=lib "command here"
  2. set RuntimeOptions.use_ope=false to not use OPE (but it doesn't look that this is available option in your example).
yapus commented

Thanks for your reply. But that didn't help though....

  1. adding neuropod_multiprocess_worker container folder to PATH doesn't change anything
$ which neuropod_multiprocess_worker >/dev/null && echo 'OK'
OK

as i told before - there's no file named libneuropod_tensorflow_backend.so at my PC now for sure

  1. How to set RuntimeOptions in python? as i can see from the neuropod/loader.py - the use_ope=True is always set and if i provide that option again in "runtime options" that is args of load_neuropod like:
with load_neuropod(PATH_TO_MODEL, use_ope=False, visible_gpu=None) as neuropod:

i get another error:

TypeError: pybind11_type object got multiple values for keyword argument 'use_ope'

Hi @yapus,

Thanks for the issue! Did you follow the instructions at https://neuropod.ai/installing/#python?

Specifically

To run models, you must also install packages for "backends". These are fully self-contained packages that let Neuropod run models with specific versions of frameworks regardless of the version installed in your python environment.

The page linked above includes instructions for installing backends. Based on the info in your initial comment, you probably want to run

pip install neuropod-backend-tensorflow-1-14-0-cpu -f https://download.neuropod.ai/whl/stable.html

Fixes

  1. The error that's supposed to be thrown in this situation is a bit more informative:

// Don't have anything that matches
NEUROPOD_ERROR("The model being loaded requires a Neuropod backend for type '{}' and version range '{}'. However, "
"a backend satisfying these requirements was not found. See the installation instructions "
"at https://neuropod.ai to install a backend. Retry with log level TRACE for more information.",
type,
target_version_range);

It looks like we hit another error first though. I'll put up a PR to clean this up.

  1. I'll modify the Issue template to make this more clear:

Using OPE: no (not sure what it is...)

Question

Is this phrasing in the installation instructions clear?

To run models, you must also install packages for "backends".

If test data is provided during packaging, Neuropod runs a test on the model immediately after export. In this situation, packaging a model also causes the model to be run. Do you think the sentence quoted above is misleading?

Would it be more clear to just say something like "To use Neuropod, you must also have at least one backend installed"?

yapus commented

@VivekPanyam oh sorry, totally my bad, i really missed that part
after installing neuropod-backend-tensorflow-1-14-0-cpu it works as expected, at least that very basic test don't fail any more. I should really RTFM more mindfully.

Is a backend module required to package neuropod model? It is strange though - you mention that i should have faced same error during packaging the model as i do have test input & output data defined and passed to create_tensorflow_neuropod method, but the model was packaged without any errors, though at that time i didn't have any backend modules installed

should we create a helper scripts to automatic resolve this kind of problem .
like if we can create a script which will auto install specific backend (tensorflow,pytorch, etc) with explicit recommendation on the terminal