ChocoTTS is a WebSocket-based interpreter for the TextToTalk plugin in Dalamud, enabling lifelike text-to-speech (TTS) and emotion inference from text. It uses the 🐸Coqui Ai TTS model for generating speech and j-hartmann's emotion transformer model for detecting emotions in text.
- Real-time TTS generation using Coqui Ai models, all generated locally
- Emotion inference using j-hartmann's emotion transformer model
- Caching of generated speech for faster repeat access
- Adjustable audio playback volume
- Support for multiple NPCs with different voice samples
The application is currently still under development, once a stable version 1.0 is ready and installer will be published.
- XIVLauncher (for dalamud)
- TextToTalk (dalamud plugin that will provide us with a websocket server to parse text from)
- Python 3.10 or higher
- ffmpeg (for audio processing)
- An NVIDIA GPU is highly recommended
This project is licensed under the GNU General Public License. See the LICENSE file for more details.