/FasterLivePortrait

Bring portraits to life in Real Time!onnx/tensorrt support!实时肖像驱动!

Primary LanguagePython

FasterLivePortrait: Bring portraits to life in Real Time!

English | 中文

Original repository: LivePortrait, thanks to the authors for sharing

New features:

  • Achieved real-time running of LivePortrait on RTX 3090 GPU using TensorRT, reaching speeds of 30+ FPS. This is the speed for rendering a single frame, including pre- and post-processing, not just the model inference speed.
  • Implemented conversion of LivePortrait model to Onnx model, achieving inference speed of about 70ms/frame (~12 FPS) using onnxruntime-gpu on RTX 3090, facilitating cross-platform deployment.
  • Seamless support for native gradio app, with several times faster speed and support for simultaneous inference on multiple faces and Animal Model.
  • Added support for JoyVASA, which can drive videos or images with audio.

If you find this project useful, please give it a star ✨✨

8.10.2.mp4

result2.mp4

Changelog

  • 2024/12/22: Add API Deployment python api.py, For more information, please refer to the tutorial.
  • 2024/12/16: Added support for JoyVASA, which can drive videos or images with audio. Very cool!
  • Update code, then download the models: huggingface-cli download TencentGameMate/chinese-hubert-base --local-dir .\checkpoints\chinese-hubert-base and huggingface-cli download jdh-algo/JoyVASA --local-dir ./checkpoints/JoyVASA
  • After launching the webui, follow the tutorial below. When the source is a video, it's recommended to only drive the mouth movements

12.16.1.mp4

  • 2024/12/14: Added pickle and image driving, as well as region driving animation_region.
    • Please update the latest code. Windows users can directly double-click update.bat to update, but note that your local code will be overwritten.
    • Running python run.py now automatically saves the corresponding pickle to the same directory as the driving video, allowing for direct reuse.
    • After opening webui, you can experience the new pickle and image driving, as well as the region driving animation_region features. Note that for image driving, remember to disable relative motion.
  • 2024/08/11: Optimized paste_back speed and fixed some bugs.
    • Used torchgeometry + cuda to optimize the paste_back function, significantly improving speed. Example: python run.py --src_image assets/examples/source/s39.jpg --dri_video assets/examples/driving/d0.mp4 --cfg configs/trt_infer.yaml --paste_back --animal
    • Fixed issues with Xpose ops causing errors on some GPUs and other bugs. Please use the latest docker image: docker pull shaoguo/faster_liveportrait:v3
  • 2024/08/11: Optimized paste_back speed and fixed some bugs.
    • Used torchgeometry + cuda to optimize the paste_back function, significantly improving speed. Example: python run.py --src_image assets/examples/source/s39.jpg --dri_video assets/examples/driving/d0.mp4 --cfg configs/trt_infer.yaml --paste_back --animal
    • Fixed issues with Xpose ops causing errors on some GPUs and other bugs. Please use the latest docker image: docker pull shaoguo/faster_liveportrait:v3
  • 2024/08/07: Added support for animal models and MediaPipe models, so you no longer need to worry about copyright issues.
    • Added support for animal models.

      • Download the animal ONNX file: huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints, then convert it to TRT format.
      • Update the Docker image: docker pull shaoguo/faster_liveportrait:v3. Using animal model:python run.py --src_image assets/examples/source/s39.jpg --dri_video 0 --cfg configs/trt_infer.yaml --realtime --animal
      • Windows users can download the latest Windows all-in-one package from the release page, then unzip and use it.
      • Simple usage tutorial:

      0807-3.mp4

    • Using MediaPipe model to replace InsightFace

      • For web usage: python webui.py --mode trt --mp or python webui.py --mode onnx --mp
      • For local webcam: python run.py --src_image assets/examples/source/s12.jpg --dri_video 0 --cfg configs/trt_mp_infer.yaml
  • 2024/07/24: Windows integration package, no installation required, one-click run, supports TensorRT and OnnxruntimeGPU. Thanks to @zhanghongyong123456 for their contribution in this issue.
    • [Optional] If you have already installed CUDA and cuDNN on your Windows computer, please skip this step. I have only verified on CUDA 12.2. If you haven't installed CUDA or encounter CUDA-related errors, you need to follow these steps:
      • Download CUDA 12.2, double-click the exe and install following the default settings step by step.
      • Download the cuDNN zip file, extract it, and copy the lib, bin, and include folders from the cuDNN folder to the CUDA 12.2 folder (default is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2)
    • Download the installation-free Windows integration package from the release page and extract it.
    • Enter FasterLivePortrait-windows and double-click all_onnx2trt.bat to convert onnx files, which will take some time.
    • For web demo: Double-click app.bat, open the webpage: http://localhost:9870/
    • For real-time camera operation, double-click camera.bat,press q to stop. If you want to change the target image, run in command line: camera.bat assets/examples/source/s9.jpg
  • 2024/07/18: macOS support added(No need for Docker, Python is enough). M1/M2 chips are faster, but it's still quite slow 😟
    • Install ffmpeg: brew install ffmpeg
    • Set up a Python 3.10 virtual environment. Recommend using miniforge: conda create -n flip python=3.10 && conda activate flip
    • Install requirements: pip install -r requirements_macos.txt
    • Download ONNX files: huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints
    • Test: python webui.py --mode onnx
  • 2024/07/17: Added support for Docker environment, providing a runnable image.

Environment Setup

  • Option 1: Docker (recommended).A docker image is provided for eliminating the need to install onnxruntime-gpu and TensorRT manually.
    • Install Docker according to your system
    • Download the image: docker pull shaoguo/faster_liveportrait:v3
    • Execute the command, replace $FasterLivePortrait_ROOT with the local directory where you downloaded FasterLivePortrait:
    docker run -it --gpus=all \
    --name faster_liveportrait \
    -v $FasterLivePortrait_ROOT:/root/FasterLivePortrait \
    --restart=always \
    -p 9870:9870 \
    shaoguo/faster_liveportrait:v3 \
    /bin/bash
  • Option 2: Create a new Python virtual environment and install the necessary Python packages manually.
    • First, install ffmpeg
    • Run pip install -r requirements.txt
    • Then follow the tutorials below to install onnxruntime-gpu or TensorRT. Note that this has only been tested on Linux systems.

Onnxruntime Inference

  • First, download the converted onnx model files:huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints.
  • (Ignored in Docker)If you want to use onnxruntime cpu inference, simply pip install onnxruntime. However, cpu inference is extremely slow and not recommended. The latest onnxruntime-gpu still doesn't support grid_sample cuda, but I found a branch that supports it. Follow these steps to install onnxruntime-gpu from source:
    • git clone https://github.com/microsoft/onnxruntime
    • git checkout liqun/ImageDecoder-cuda. Thanks to liqun for the grid_sample with cuda implementation!
    • Run the following commands to compile, changing cuda_version and CMAKE_CUDA_ARCHITECTURES according to your machine:
    ./build.sh --parallel \
    --build_shared_lib --use_cuda \
    --cuda_version 11.8 \
    --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda/ \
    --config Release --build_wheel --skip_tests \
    --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="60;70;75;80;86" \
    --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
    --disable_contrib_ops \
    --allow_running_as_root
    • pip install build/Linux/Release/dist/onnxruntime_gpu-1.17.0-cp310-cp310-linux_x86_64.whl
  • Test the pipeline using onnxruntime:
      python run.py \
     --src_image assets/examples/source/s10.jpg \
     --dri_video assets/examples/driving/d14.mp4 \
     --cfg configs/onnx_infer.yaml
    

TensorRT Inference

  • (Ignored in Docker) Install TensorRT. Remember the installation path of TensorRT.
  • (Ignored in Docker) Install the grid_sample TensorRT plugin, as the model uses grid sample that requires 5D input, which is not supported by the native grid_sample operator.
    • git clone https://github.com/SeanWangJS/grid-sample3d-trt-plugin
    • Modify line 30 in CMakeLists.txt to: set_target_properties(${PROJECT_NAME} PROPERTIES CUDA_ARCHITECTURES "60;70;75;80;86")
    • export PATH=/usr/local/cuda/bin:$PATH
    • mkdir build && cd build
    • cmake .. -DTensorRT_ROOT=$TENSORRT_HOME, replace $TENSORRT_HOME with your own TensorRT root directory.
    • make, remember the address of the .so file, replace /opt/grid-sample3d-trt-plugin/build/libgrid_sample_3d_plugin.so in scripts/onnx2trt.py and src/models/predictor.py with your own .so file path
  • Download ONNX model files:huggingface-cli download warmshao/FasterLivePortrait --local-dir ./checkpoints. Convert all ONNX models to TensorRT, run sh scripts/all_onnx2trt.sh and sh scripts/all_onnx2trt_animal.sh
  • Test the pipeline using tensorrt:
     python run.py \
     --src_image assets/examples/source/s10.jpg \
     --dri_video assets/examples/driving/d14.mp4 \
     --cfg configs/trt_infer.yaml
  • To run in real-time using a camera:
     python run.py \
     --src_image assets/examples/source/s10.jpg \
     --dri_video 0 \
     --cfg configs/trt_infer.yaml \
     --realtime

Gradio App

  • onnxruntime: python webui.py --mode onnx
  • tensorrt: python webui.py --mode trt
  • The default port is 9870. Open the webpage: http://localhost:9870/

About Me

Follow my shipinhao channel for continuous updates on my AIGC content. Feel free to message me for collaboration opportunities.

视频号