/LLMAvatarTalk-An-Interactive-AI-Assistant

Harness the power of NVIDIA technologies and LangChain to create dynamic avatars from live speech, integrating RIVA ASR and TTS with Audio2Face for real-time, expressive digital interactions.

Primary LanguagePythonMIT LicenseMIT

LLMAvatarTalk: An Interactive AI Assistant

LLMAvatarTalk is an innovative project that combines state-of-the-art AI technologies to create an interactive virtual assistant. By integrating automatic speech recognition (ASR), large language models (LLM), LangChain, text-to-speech (TTS), audio-driven facial animation (Audio2Face), and Unreal Engine's Metahuman, LLMAvatarTalk showcases the potential of AI in achieving seamless and engaging human-computer interaction

English | 中文

Demo

Click the thumbnail below to watch the demo on YouTube: Watch the video

Features

  • Speech Recognition: Converts user speech into text in real-time using NVIDIA RIVA ASR technology.
  • Language Processing: Leverages advanced LLM (such as llama3-70b-instruct) via NVIDIA NIM APIs for deep semantic understanding and response generation.
  • Text-to-Speech: Transforms generated text responses into natural-sounding speech using NVIDIA RIVA TTS.
  • Facial Animation: Generates realistic facial expressions and animations based on audio output using Audio2Face technology.
  • Unreal Engine Integration: Enhances virtual character expressiveness by real-time linking Audio2Face with Unreal Engine's Metahuman.
  • LangChain Integration: Simplifies the integration of NVIDIA RIVA and NVIDIA NIM APIs, providing a seamless and efficient workflow for AI development.

Architecture

Prerequisites

Installation

Tested Environment: Windows 11 & Python 3.9

git clone https://github.com/yourusername/LLMAvatarTalk.git
cd LLMAvatarTalk
pip install -r requirements.txt

Execution

  1. Ensure you have set up the Riva server and configured Audio2Face and Unreal Engine.
  2. Create a .env file and input the NVIDIA NIMs API KEY. You can find a sample in .env.sample.
    NVIDIA_API_KEY=nvapi-
    
  3. Input the Riva server's IP into the URI field in config.py. The default port for Riva servers is "50051".
    URI = '192.168.1.205:50051'
    
  4. In the config.py file, you can also specify the language for the application interface and responses. The available options are 'en-US' for English and 'zh-CN' for Chinese. The default language is set to English.
    LANGUAGE = 'en-US'  # Change to 'zh-CN' for Chinese.
    
  5. Run python main.py

To-Do List

  • Optimize LLM functionality, including adding RAG and Agent
  • Improve TTS
  • Implement emotion detection and full-body animation
  • Integrate asynchronous processing
  • Integration with LangChain (RIVA's ASR and TTS), currently using a temporary alternative

Acknowledgments

Special thanks to the following projects and documentation:

RIVA:

Audio2Face:

LangChain

Projects