occ-ai/obs-backgroundremoval

CPU mode is working but GPU mode errors with: TensorRT input: 389 has no shape specified. Please run shape inference

aanno opened this issue · 9 comments

aanno commented

Describe the bug

CPU mode is working but GPU mode errors with:

error: Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:1387 SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const [ONNXRuntimeError] : 1 : FAIL : TensorRT input: 389 has no shape specified. Please run shape inference on the onnx model first. Details can be found in https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs

To Reproduce

Start obs with:

export LD_LIBRARY_PATH=/opt/cuda/lib64:/usr/local/lib/python3.11/site-packages/tensorrt_libs/
obs

Use obs-backgroundremoval. CPU mode working, error on stdout like above with GPU mode.

Expected behavior

GPU mode should work as well (no errors).

Desktop (please complete the following information):

  • OS: Fedora 38 x86_64
  • Plugin Version: self-compiled from git main (beyond 1.1.5)
  • OBS Version: 29.1.3 (from distro, not flatpak)

Additional context

Perhaps https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs is already the solution?!?

Can you tell us your version of TensorRT, cuDNN, and CUDA?

aanno commented

Nvidia and cuda stuff is from nvidia repo at https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/ (there is no repo for f38). TensorRT is installed with pip3 install tensorrt.

  • cuda 12.2.2-1
  • tensorrt 8.6.1.post1
  • nvidia driver 535.104.05
  • Not sure about cuDNN at present (sorry!)
aanno commented

Hm, it seems that you could use cuda-11 and cuda-12 on the same machine. And I updated cuDNN to 8.9.3 (using the tar from https://developer.nvidia.com/rdp/cudnn-archive). Now I run obs like this:

export LD_LIBRARY_PATH=/usr/local/cuda-11/lib64:/opt/cuda-11/cudnn-linux-x86_64-8.9.3.28_cuda11-archive/lib:/usr/local/lib/python3.11/site-packages/tensorrt_libs
obs

However the problem is still the same. Hence I tried what is suggested in the error message as well (https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs) with the following script:

for i in *.onnx; do
        echo $i
        python ~/Downloads/symbolic_shape_infer.py --input $i --output out/$i --auto_merge --verbose 2
done

on the models provided:

$ ./shape.sh 
mediapipe.onnx
pphumanseg_fp32.onnx
rvm_mobilenetv3_fp32.onnx
Traceback (most recent call last):
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 1768, in less_equal
    return bool(y - x >= 0)
           ^^^^^^^^^^^^^^^^
  File "/home/tpasch/.local/lib/python3.11/site-packages/sympy/core/relational.py", line 510, in __bool__
    raise TypeError("cannot determine truth value of Relational")
TypeError: cannot determine truth value of Relational

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 2851, in <module>
    out_mp = SymbolicShapeInference.infer_shapes(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 2783, in infer_shapes
    all_shapes_inferred = symbolic_shape_inference._infer_impl()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 2540, in _infer_impl
    self.dispatcher_[node.op_type](node)
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 1815, in _infer_Slice
    e = handle_negative_index(e, new_sympy_shape[i])  # noqa: PLW2901
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 1776, in handle_negative_index
    if not less_equal(0, index):
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 1771, in less_equal
    return all(bool(d >= 0) for d in flatten_min(y - x))
                                     ^^^^^^^^^^^^^^^^^^
  File "/home/tpasch/Downloads/symbolic_shape_infer.py", line 1728, in flatten_min
    assert isinstance(expr, sympy.Add), f"Expected a sum of two arguments, got {expr}"
AssertionError: Expected a sum of two arguments, got ceiling(ceiling(ceiling(Resize_3_o0__d2/2)/2)/2)
selfie_segmentation.onnx
semantic_guided_llie_180x324.onnx
SINet_Softmax_simple.onnx
tbefn_fp32.onnx
tcmonodepth_tcsmallnet_192x320.onnx
uretinex_net_180x320.onnx
zero_dce_180x320.onnx

For me this looks a bit like rvm_mobilenetv3_fp32.onnx is the culprit (at least for the error message).

Is there anybody with working GPU support on linux? What distribution?

Can you try another segmentation model?

Is there anybody with working GPU support on linux? What distribution?

We officially support Ubuntu 22.04 with the latest OBS. We will never officially support Fedora and you have to do some hacks on your own to get it to work.

Hi @aanno, were you able to run it with GPU and Fedora?

Any chance of a build of this for flatpak and cuda 12? I'm on gentoo and flatpak is the only way to get those twitch panels, and the flatpak won't pick up plugins from the system directories.

@Lucretia We have no plans and resources to support CUDA on Flatpak but you can use Ubuntu for streaming.

@Lucretia Please do not post any offensive contents.