dmlc/decord

Build failure in docker because of libnvcuvid

trifle opened this issue · 14 comments

Hi,

this is a bug report because some people get stuck at this point, but I'm not quite convinced it's your fault :)

Issue: cmake fails to find libnvcuvid when you try to build decord in an nvidia cuda container will all necessary libs linked in (-e NVIDIA_DRIVER_CAPABILITIES=all - this should provide the container with pretty much everything that's in CUDA, cuvid, cudnn and so on).

[edit]: Just to add, this is using the official nvidia pre-built cuda dev container nvidia/cuda:10.1-cudnn7-devel

That should obviously not happen. I've also successfully built ffmpeg with all the cuvid accelerations, dlib and other software - all successfully using cmake.

So why is this happening? The container has libnvcuvid here:

lrwxrwxrwx 1 root root   20 Oct 16 15:24 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 -> libnvcuvid.so.450.57
-rw-r--r-- 1 root root 3.6M Jul  5 15:12 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.57

This is kind of strange. We know that libnvcuvid is not part of cuda but part of the driver. The host has 450.57 installed, so that's why the lib is versioned and provided via a link. But!! Nvidia itself recommends against using *.so.1 to load dlibs - instead, one apparently should use libnvcuvid.so(no suffix). I guess that's what you guys do.

After linking /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 to libnvcuvid.so, everything works as expected.

So, maybe you want to add some safety check to your cmake config to catch this behavior? Or document it somewhere? (I've at least written up this report so people can find it via the issues).

BTW, thanks a lot for decord and your hard work!

I have exactly this problem. Could you post your full Dockerfile?

I have exactly this problem. Could you post your full Dockerfile?

Good to see. Hope my hints above help you.
I'll try to find the time to extract the non-proprietary parts of the dockerfile, but can't promise. What are you especially interested in? You can work around the build issue by simply linking the library:

ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 <<<YOUR CUDA DIR>>/libnvcuvid.so

Thank you for your response. Much appreciated. I have created the symbolic link. However, it seems like my docker container still does not mount the library, it is missing. Building the library therefore fails.

-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=CUDA_NVCUVID_LIBRARY-NOTFOUND

I added ENV NVIDIA_DRIVER_CAPABILITIES video,compute,utility to my Dockerfile.

If I enter ldconfig -p | grep libnvcuvid in the host I get the following output:

        libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
        libnvcuvid.so.1 (libc6) => /usr/lib/i386-linux-gnu/libnvcuvid.so.1
        libnvcuvid.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so
        libnvcuvid.so (libc6) => /usr/lib/i386-linux-gnu/libnvcuvid.so

However, I don't get any output in the container...

@bravma Curious. Try this:

sudo docker run --rm -it --gpus all -e NVIDIA_DRIVER_CAPABILITIES=video,compute,utility nvidia/cuda:10.1-cudnn7-devel ldconfig -p | grep libnvcuvid

It should output

	libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

If not, then there's something fundamentally broken, perhaps with the nvidia docker runtime?

You're right, I got exactly this output libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1.

Now if I link the library correctly, cmake finds it. However, during building it throws the following error:

make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so', needed by 'CMakeFiles/decord.dir/cmake_device_link.o'.  Stop.

This is the full output that I receive:

Submodule path '3rdparty/dlpack': checked out '5c792cef3aee54ad8b7000111c9dc1797f327b59'
Submodule path '3rdparty/dmlc-core': checked out 'd07fb7a443b5db8a89d65a15a024af6a425615a5'
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
-- Unable to find libavdevice, device input API will not work!
-- Found FFMPEG or Libav: /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so, /usr/include/x86_64-linux-gnu
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
FFMPEG_INCLUDE_DIR = /usr/include/x86_64-linux-gnu
FFMPEG_LIBRARIES = /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=/usr/lib/x86_64-linux-gnu/libnvcuvid.so
-- Build with CUDA support
-- Configuring done
-- Generating done
-- Build files have been written to: /decord/build
Scanning dependencies of target decord
[  2%] Building CXX object CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o
[  5%] Building CXX object CMakeFiles/decord.dir/src/runtime/cpu_device_api.cc.o
[  8%] Building CXX object CMakeFiles/decord.dir/src/runtime/dso_module.cc.o
[ 11%] Building CXX object CMakeFiles/decord.dir/src/runtime/file_util.cc.o
[ 13%] Building CXX object CMakeFiles/decord.dir/src/runtime/module.cc.o
[ 16%] Building CXX object CMakeFiles/decord.dir/src/runtime/module_util.cc.o
[ 19%] Building CXX object CMakeFiles/decord.dir/src/runtime/ndarray.cc.o
[ 22%] Building CXX object CMakeFiles/decord.dir/src/runtime/registry.cc.o
[ 25%] Building CXX object CMakeFiles/decord.dir/src/runtime/str_util.cc.o
[ 27%] Building CXX object CMakeFiles/decord.dir/src/runtime/system_lib_module.cc.o
[ 30%] Building CXX object CMakeFiles/decord.dir/src/runtime/thread_pool.cc.o
[ 33%] Building CXX object CMakeFiles/decord.dir/src/runtime/threading_backend.cc.o
[ 36%] Building CXX object CMakeFiles/decord.dir/src/runtime/workspace_pool.cc.o
[ 38%] Building CXX object CMakeFiles/decord.dir/src/video/logging.cc.o
[ 41%] Building CXX object CMakeFiles/decord.dir/src/video/storage_pool.cc.o
[ 44%] Building CXX object CMakeFiles/decord.dir/src/video/video_interface.cc.o
[ 47%] Building CXX object CMakeFiles/decord.dir/src/video/video_loader.cc.o
[ 50%] Building CXX object CMakeFiles/decord.dir/src/video/video_reader.cc.o
[ 52%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_file_order_sampler.cc.o
[ 55%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_sampler.cc.o
[ 58%] Building CXX object CMakeFiles/decord.dir/src/sampler/sequential_sampler.cc.o
[ 61%] Building CXX object CMakeFiles/decord.dir/src/sampler/smart_random_sampler.cc.o
[ 63%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/filter_graph.cc.o
[ 66%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/threaded_decoder.cc.o
[ 69%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_context.cc.o
[ 72%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_decoder_impl.cc.o
[ 75%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_mapped_frame.cc.o
[ 77%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_parser.cc.o
[ 80%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_stream.cc.o
[ 83%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_texture.cc.o
[ 86%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_threaded_decoder.cc.o
[ 88%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_device_api.cc.o
[ 91%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_module.cc.o
[ 94%] Building CUDA object CMakeFiles/decord.dir/src/improc/improc.cu.o
make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so', needed by 'CMakeFiles/decord.dir/cmake_device_link.o'.  Stop.
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/decord.dir/all' failed
make[1]: *** [CMakeFiles/decord.dir/all] Error 2
make: *** [all] Error 2
Makefile:129: recipe for target 'all' failed

I was able to build the library by copying libnvcuvid.so.440.100 to libnvcuvid.so. Of course this is more of a hack than a real solution.

Now if I load a video using the following code, I get this strange exception.

reader = VideoReader(video_path, ctx=gpu(0), width=width, height=height)
    frames_to_skip = int(reader.get_avg_fps() / fps)
    indices = list(range(0, len(reader), frames_to_skip))
    frames = reader.get_batch(indices)
[09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Quadro RTX 8000
[09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 440.1, so using our own stream.
Traceback (most recent call last):
  File "decord_loader_test.py", line 67, in <module>
    TikTokVideoGeneratorTest().test_loader()
  File "decord_loader_test.py", line 38, in test_loader
    video = load_video_decord(path, settings.fps, settings.width, settings.height, tf_bridge=False)
  File "../preprocessing/video_loader.py", line 17, in load_video_decord
    frames = reader.get_batch(indices)
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/video_reader.py", line 163, in get_batch
    arr = _CAPI_VideoReaderGetBatch(self._handle, indices)
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/_ffi/base.py", line 63, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/video_reader.cc:559: Error seeking keyframe: 250 with total frames: 451

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x9d) [0x7f394ac35c9d]
[bt] (1) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x45) [0x7f394ac35fdb]
[bt] (2) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::CheckKeyFrame()+0x1c1) [0x7f394ac8af19]
[bt] (3) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::SeekAccurate(long)+0x120) [0x7f394ac88fc8]
[bt] (4) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::GetBatch(std::vector<long, std::allocator<long> >, decord::runtime::NDArray)+0x7a6) [0x7f394ac8bb44]
[bt] (5) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(+0x17472a) [0x7f394ac7672a]
[bt] (6) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(+0x176f56) [0x7f394ac78f56]
[bt] (7) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(std::function<void (decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*)>::operator()(decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*) const+0x5a) [0x7f394ac3a216]
[bt] (8) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::runtime::PackedFunc::CallPacked(decord::runtime::DECORDArgs, 
decord::runtime::DECORDRetValue*) const+0x30) [0x7f394ac385c0]
[bt] (9) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(DECORDFuncCall+0x95) [0x7f394ac33381]

If I randomly index some frames it works. Accessing key frames on the other hand throws this exception...

First of all, thanks for digging into the issue.

  • for the build/link error, since there's no fundamental documentation on how libnvcuvid.so is organized in different cuda distributions, fundamentally the logic to find the dylib is flawed: https://github.com/dmlc/decord/blob/master/cmake/util/FindCUDA.cmake#L103

  • for the runtime error, if you can provide more details, e.g. the video you are using, I can probably locate the problem

@trifle The runtime error should be fixed by #103, I also improved the readme in #104, it would be great if you can contribute a clean and working dockerfile to help others!

@zhreshold Wonderful, many thanks for your help!
I'm pessimistic about my time budget, but yes, a working dockerfile would be great.
If I ever get to making one, I'll let you know!

Thank you all for your help!

Sorry to resurrect this issue - but this might help some people coming here for assistance.

I've ran across the issue that @bravma encountered ( No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so').
The cause is: Lacking access to a GPU during the build phase of the container.

You need to configure the docker daemon to use nvidia's runtime by default, then the issue does not occur.

@trifle - How do you configure the docker daemon to use nvidia's runtime by default? Thanks for your help.

@prithvinambiar
Add the line "default-runtime": "nvidia" to /etc/docker/daemon.json and restart the docker daemon.
There is some documentation out there that you can find easily.

I have exactly this problem. Could you post your full Dockerfile?

Good to see. Hope my hints above help you. I'll try to find the time to extract the non-proprietary parts of the dockerfile, but can't promise. What are you especially interested in? You can work around the build issue by simply linking the library:

ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 <<<YOUR CUDA DIR>>/libnvcuvid.so

I used this method, but I also have the problems below:
after cmake:
image
after make:
image