/ChocoTTS

An Ai driven WebSocket interpreter for Dalamud's TextToTalk addon that leverages Coqui Ai TTS for lifelike audio and j-hartmann's emotion transformer model to infer emotion from text.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

ChocoTTS

ChocoTTS is a WebSocket-based interpreter for the TextToTalk plugin in Dalamud, enabling lifelike text-to-speech (TTS) and emotion inference from text. It uses the 🐸Coqui Ai TTS model for generating speech and j-hartmann's emotion transformer model for detecting emotions in text.

Features

  • Real-time TTS generation using Coqui Ai models, all generated locally
  • Emotion inference using j-hartmann's emotion transformer model
  • Caching of generated speech for faster repeat access
  • Adjustable audio playback volume
  • Support for multiple NPCs with different voice samples

Installation

The application is currently still under development, once a stable version 1.0 is ready and installer will be published.

Prerequisites

  • XIVLauncher (for dalamud)
  • TextToTalk (dalamud plugin that will provide us with a websocket server to parse text from)
  • Python 3.10 or higher
  • ffmpeg (for audio processing)
  • An NVIDIA GPU is highly recommended

License

This project is licensed under the GNU General Public License. See the LICENSE file for more details.