parkchamchi/DepthViewer

Support for other GPUs (Intel Arc, AMD)

dan9070 opened this issue ยท 18 comments

Hiya, I used to use your software initially through github then through steam on my GTX 1060, but I've since updated to an Intel Arc A770 and now when loading any model on your application it goes directly to CPU inference.

Is it possible you could make it compatible for usage with other GPUs? I noticed the barracuda model that's build in runs on my GPU but everything else does not.

The NN framework it uses (other than Unity.Barracuda) is OnnxRuntime, and it uses CUDA (and cuDNN) when the GPU toggle is on (code here). OnnxRuntime supports other ones other than CUDA (see here, the factory methods starting with MakeSessionOptionWith), I can add an option to use these, but there're some problems:

  • The official OnnxRuntime build does not include providers other than CUDA and TensorRT, so probably it has to be built from the source to use them.
  • Intel GPUs don't seem to support either CUDA or ROCM, maybe TVM would work?

By the way, can you send me the error message on the console (backtick `) it prints when it fails to load the model?

There isn't an error upon failing to load the model. It loads the model and runs inference on the CPU instead of the GPU I use, which is an Arc A770 LE.

I was mainly hoping for this software to support AMD and Intel arc universally. I appreciate the response though.

It appears that in your previous Nvidia environment OnnxRuntime would load CUDA when it is detected. It's weird, since in my environment I have to explicitly declare CUDA to be used. If you turn on the use cuda toggle it would throw an exception, right?

On the other hand, I'll test some GPU execution providers other than CUDA such as DirectML (seems promising for the Intel GPUs), OpenVINO, TVM and ROCm.

I never tested that, and I probably should have. However, due to specific other issues regarding drivers from my old hardware setup I have since reset Windows 11. Either way, I'm greatly appreciative that you intend on trying to implement the other GPU exec providers as it opens up availability for this software on other hardware.

Using the same method this program is using CUDA for OnnxRuntime, I used onnxruntime.dll from Microsoft.ML.OnnxRuntime.DirectML v1.14.1 and the matching version of the managed Microsoft.ML.OnnxRuntime.dll, and tried to load it, only to be met by

Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] 
  at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess (System.IntPtr nativeStatus) [0x0002c] in <1f49930870034b16a79abbd8ce8f4b49>:0 
  at Microsoft.ML.OnnxRuntime.SessionOptions.AppendExecutionProvider_DML (System.Int32 deviceId) [0x0000d] in <1f49930870034b16a79abbd8ce8f4b49>:0

Using DirectML.dll and DirectML.Debug.dll from Microsoft.AI.MachineLearning gave me the same result.

Using the DLLs I built from the source:

  • DirectML
    Same error message as above.

  • OpenVINO
    Successfully loaded, and it uses my integrated Intel GPU (that is disabled by default). I assume it can be used with external Intel GPUs.

=> Only on the editor, on the bulid it gives me

Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] D:\codes\onnxruntime\new\onnxruntime\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\codes\DepthViewer\Build\DepthViewer_Data\Plugins\x86_64\onnxruntime_providers_openvino.dll"

  at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess (System.IntPtr nativeStatus) [0x0002c] in <56cc2af98561469babdd24bc23371de1>:0 
  at Microsoft.ML.OnnxRuntime.SessionOptions.AppendExecutionProvider_OpenVINO (System.String deviceId) [0x00027] in <56cc2af98561469babdd24bc23371de1>:0 

which is the same error that occurs on the editor when OpenVINO is not in PATH.

  • TVM
OnnxRuntimeException, provider: TVM, gpuid: 0. Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\codes\onnxruntime\new\onnxruntime\onnxruntime\core\providers\tvm\tvm_api.cc:50 onnxruntime::tvm::TVMCompile compile != nullptr was false. Unable to retrieve 'tvm_onnx_import_and_compile'.

  at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess (System.IntPtr nativeStatus) [0x0002c] in <56cc2af98561469babdd24bc23371de1>:0 
  at Microsoft.ML.OnnxRuntime.InferenceSession.Init (System.String modelPath, Microsoft.ML.OnnxRuntime.SessionOptions options, Microsoft.ML.OnnxRuntime.PrePackedWeightsContainer prepackedWeightsContainer) [0x0002f] in <56cc2af98561469babdd24bc23371de1>:0 
  at Microsoft.ML.OnnxRuntime.InferenceSession..ctor (System.String modelPath, Microsoft.ML.OnnxRuntime.SessionOptions options) [0x0002c] in <56cc2af98561469babdd24bc23371de1>:0

Seems to be a build problem. Also requires some tweaks

  • ROCm
    No Windows version? ๐Ÿ˜

  • CUDA & CPU
    Works without error.

I'll keep trying the other ways.

Any updates on this would be greatly appreciated. I'm still here in the background,

I'll try it again when the exam is over. Also I'm thinking about using the message queue communication with python using this, which would be much faster and robust than the currently implemented HTTP/Flask method (especially it's not bound to the Unity3D coroutine restriction).

I'll try it again when the exam is over. Also I'm thinking about using the message queue communication with python using this, which would be much faster and robust than the currently implemented HTTP/Flask method (especially it's not bound to the Unity3D coroutine restriction).

Thank you.

This is a little workaround using communication with Python, which is not as convenient as using on the Unity side but it is devoid of headaches caused by library linking. Instead of inferencing the model on the Unity/C# side it would ask the Python script using ZeroMQ, which I felt no overhead.
Unfortunately the PyTorch seems not to support non-CUDA GPU accelerators, so I added the OnnxRuntime version for the python script. I believe the DirectML version of OnnxRuntime supports the Intel GPUs well.

Also see here

  1. Get the dependencies for the python scripts
pip install numpy opencv-python torch torchvision timm pyzmq
pip install onnxruntime-directml
  1. Run the depthpy/depthmq.py. It's included in the build since v0.8.11-beta.1
python depthmq.py --ort --ort_ep dml
  1. Open the main program, open the console (`), and enter
zmq 5555

It's not refined right now, any bug reports will be appreciated.

This is a little workaround using communication with Python, which is not as convenient as using on the Unity side but it is devoid of headaches caused by library linking. Instead of inferencing the model on the Unity/C# side it would ask the Python script using ZeroMQ, which I felt no overhead. Unfortunately the PyTorch seems not to support non-CUDA GPU accelerators, so I added the OnnxRuntime version for the python script. I believe the DirectML version of OnnxRuntime supports the Intel GPUs well.

Also see here

  1. Get the dependencies for the python scripts
pip install numpy opencv-python torch torchvision timm pyzmq
pip install onnxruntime-directml
  1. Run the depthpy/depthmq.py. It's included in the build since v0.8.11-beta.1
python depthmq.py --ort --ort_ep dml
  1. Open the main program, open the console (`), and enter
zmq 5555

It's not refined right now, any bug reports will be appreciated.

https://imgur.com/a/FJ0Wk6c

It currently does not seem to give any proper depth output using DML on Intel Arc. No clue why.

(Depthviewer) C:\Users\dbs_5\OneDrive\Desktop\Build\depthpy>python depthmq.py --model dpt_beit_large_512 --ort --ort_ep dml
depthmq: Init.
Initialize
OrtRunner: using provider dml
Trying to load ../onnx\dpt_beit_large_512.onnx...
2023-05-05 17:51:54.3912201 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
Assuming 512x512...
depthmq: Preparing the model. This may take some time.
depthmq: Done.
depthmq: Binding to tcp://*:5555


{'ptype': 'REQ', 'pname': 'HANDSHAKE_DEPTH', 'pversion': '1', 'client_program': 'DepthViewer', 'client_program_version': 'v0.8.11-beta.1'}
Using handler <function on_req_handshake_depth at 0x00000251C20E45E0>


{'ptype': 'REQ', 'pname': 'DEPTH', 'input_format': 'jpg'}
len(data): 8719
Using handler <function on_req_depth at 0x00000251C20E4670>


{'ptype': 'REQ', 'pname': 'DEPTH', 'input_format': 'jpg'}
len(data): 62704
Using handler <function on_req_depth at 0x00000251C20E4670>


{'ptype': 'REQ', 'pname': 'DEPTH', 'input_format': 'jpg'}
len(data): 246724
Using handler <function on_req_depth at 0x00000251C20E4670>


{'ptype': 'REQ', 'pname': 'DEPTH', 'input_format': 'jpg'}
len(data): 1204589
Using handler <function on_req_depth at 0x00000251C20E4670>


I've noticed that dpt_beit_large_512 does not generate proper output with DML. Do the other models (say dpt_hybrid_384) work?

I tested dpt_hybrid_384 and that gives a garbled depth output.

I modified the entry in midas's model loader script to change openvino's loader from CPU to GPU and openvino_midas_v21_small_256 to a custom "openvino_dpt_beit_large_512" model I converted using the beit large onnx model and openvino's model convertor. That actually works and gives out a proper depth output.

In model_loader.py, Line 147

elif model_type == "openvino_midas_dpt_large_512":
    ie = Core()
    uncompiled_model = ie.read_model(model=model_path)
    model = ie.compile_model(uncompiled_model, "GPU")
    net_w, net_h = 512, 512
    resize_mode = "upper_bound"
    normalization = NormalizeImage(
        mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]
    )

Glad that it works. I don't understand why ORT + DML +dpt_hybrid_384 does not work in your environment though. I'll test the OpenVINO optimization myself.

Oh great, it works. I've updated the python scripts for loading v3/v3.1 OpenVINO formats (commit 4f67e1a). I don't know why dpt_hybrid_384 can't be converted though. (conversion script)

0.10.0-beta.1: Migrated to Sentis from Barracuda 3.0, and MiDaS v3+ works without ORT. Closing this.