I was working on a pose estimation desktop application - juxt.space.
My tech stack are React/Vite as frontend, FastAPI as backend, Tauri with GitHub Actions to auto-publish Windows/Mac/Ubuntu executables on release created - Tauri Action.
I was struggling to integrate the FastAPI (CPU works fine, but GPU no) server as a Tauri Embedded Binary, because there's no existing solution to port onnx gpu runtime to exe.
I thought it might be useful to share some tips/knowledge on how to achieve this because, during the time, there's no solution for this.
Windows users with GPU only for now (haven't tested for Ubuntu-GPU), most likely the same steps.
- Run an ONNX GPU inference server without installing any Python dependencies.
- Serve as a Tauri Sidecar/Embedded EXE to empower JavaScript/Rust applications.
- Prevent end users from installing the Nvidia Cuda Toolkit (works well with GPU).
We are using Pyinstaller to compile python script to executable binary. Make sure you have your FastAPI server.py ready. You can use this server as a headstart.
pip install pyinstaller onnxruntime-gpu
The below command won't work in Windows. For some reason, pyinstaller in Windows can't trace some of the imports. You will need to manually build and run the exe, and debug which module is not found.
pyinstaller -c -F --clean --name sidecar-x86_64-pc-windows-msvc --specpath dist --distpath dist server.py
In this case, I found out that when I run the above command, it produce an .exe file, but when I run the .exe file, it excited due to cv2: Module Not Found
.
Then I manually added in --hidden-import=cv2
, and it worked.
pyinstaller -c -F --clean --hidden-import=cv2 --name sidecar-x86_64-pc-windows-msvc --specpath dist --distpath dist server.py
After all, you will notice that when you run your server.exe and set the onnx device to cuda
. You will face tons of errors like:
CUDA_PATH is set but CUDA wasn't able to be loaded
- Github Issue
After reading all the Github Issues/threads, you will realize that there's no existing solution out there.
Maybe I need to ask all of my end users to install Nvidia CUDNN toolkits? Hmm, that is not ideal.
I stumbled across some random China Blog Posts and realized that I need to include some onnxruntime_*.dll
into that executable. Bam, problem solved.
Unzip this: https://github.com/microsoft/onnxruntime/releases/download/v1.17.3/onnxruntime-win-x64-gpu-cuda12-1.17.3.zip,
Add these three lines to your pyinstaller
command
--add-binary="./onnxruntime_providers_cuda.dll;./onnxruntime/capi/" \
--add-binary="./onnxruntime_providers_tensorrt.dll;./onnxruntime/capi/" \
--add-binary="./onnxruntime_providers_shared.dll;./onnxruntime/capi/"
OR you can find these three files in your local env: envs\{env_name}\Lib\site-packages\onnxruntime\capi\*.dll
.
Your FastAPI exe now supports GPU runtime without extra config.
Refer to: https://github.com/ziqinyeow/juxtapose/blob/main/.github/workflows/exe.yml