LLMAvatarTalk: An Interactive AI Assistant

LLMAvatarTalk is an innovative project that combines state-of-the-art AI technologies to create an interactive virtual assistant. By integrating automatic speech recognition (ASR), large language models (LLM), LangChain, text-to-speech (TTS), audio-driven facial animation (Audio2Face), and Unreal Engine's Metahuman, LLMAvatarTalk showcases the potential of AI in achieving seamless and engaging human-computer interaction

English | 中文

Demo

Click the thumbnail below to watch the demo on YouTube:

Features

Speech Recognition: Converts user speech into text in real-time using NVIDIA RIVA ASR technology.
Language Processing: Leverages advanced LLM (such as llama3-70b-instruct) via NVIDIA NIM APIs for deep semantic understanding and response generation.
Text-to-Speech: Transforms generated text responses into natural-sounding speech using NVIDIA RIVA TTS.
Facial Animation: Generates realistic facial expressions and animations based on audio output using Audio2Face technology.
Unreal Engine Integration: Enhances virtual character expressiveness by real-time linking Audio2Face with Unreal Engine's Metahuman.
LangChain Integration: Simplifies the integration of NVIDIA RIVA and NVIDIA NIM APIs, providing a seamless and efficient workflow for AI development.

Architecture

Prerequisites

NVIDIA NIMs API KEY
- Apply for free 1000 credits at NVIDIA NIMs
Nvidia Riva Server
- RIVA Step-by-step Tutorial
Audio2Face
- Audio2Face Step-by-step Tutorial
Unreal Engine & Metahuman
- Unreal Engine & Metahuman Step-by-step Tutorial

Installation

Tested Environment: Windows 11 & Python 3.9

git clone https://github.com/yourusername/LLMAvatarTalk.git
cd LLMAvatarTalk
pip install -r requirements.txt

Execution

Ensure you have set up the Riva server and configured Audio2Face and Unreal Engine.
Create a .env file and input the NVIDIA NIMs API KEY. You can find a sample in .env.sample.
```
NVIDIA_API_KEY=nvapi-
```
Input the Riva server's IP into the URI field in config.py. The default port for Riva servers is "50051".
```
URI = '192.168.1.205:50051'
```
In the config.py file, you can also specify the language for the application interface and responses. The available options are 'en-US' for English and 'zh-CN' for Chinese. The default language is set to English.
```
LANGUAGE = 'en-US'  # Change to 'zh-CN' for Chinese.
```
Run python main.py

To-Do List

Optimize LLM functionality, including adding RAG and Agent
Improve TTS
Implement emotion detection and full-body animation
Integrate asynchronous processing
Integration with LangChain (RIVA's ASR and TTS), currently using a temporary alternative

Anandhfullstack/LLMAvatarTalk-An-Interactive-AI-Assistant

LLMAvatarTalk: An Interactive AI Assistant

Demo

Features

Architecture

Prerequisites

Installation

Execution

To-Do List

Acknowledgments

RIVA:

Audio2Face:

LangChain

Projects