🗣️ Open TTS Tracker

A one stop shop to track all open-access/ source TTS models as they come out. Feel free to make a PR for all those that aren't linked here.

This is aimed as a resource to increase awareness for these models and to make it easier for researchers, developers, and enthusiasts to stay informed about the latest advancements in the field.

Note

This repo will only track open source/access codebase TTS models. More motivation for everyone to open-source! 🤗

Name	GitHub	Weights	License	Fine-tune	Languages	Paper	Demo	Issues
Amphion	Repo	🤗 Hub	MIT	No	Multilingual	Paper	🤗 Space
AI4Bharat	Repo	🤗 Hub	MIT	Yes	Indic	Paper	Demo
Bark	Repo	🤗 Hub	MIT	No	Multilingual	Paper	🤗 Space
EmotiVoice	Repo	GDrive	Apache 2.0	Yes	ZH + EN	Not Available	Not Available	Separate GUI agreement
Glow-TTS	Repo	GDrive	MIT	Yes	English	Paper	GH Pages
GPT-SoVITS	Repo	🤗 Hub	MIT	Yes	Multilingual	Not Available	Not Available
HierSpeech++	Repo	GDrive	MIT	No	KR + EN	Paper	🤗 Space
IMS-Toucan	Repo	GH release	Apache 2.0	Yes	Multilingual	Paper	🤗 Space
MahaTTS	Repo	🤗 Hub	Apache 2.0	No	English + Indic	Not Available	Recordings, Colab
Matcha-TTS	Repo	GDrive	MIT	Yes	English	Paper	🤗 Space	GPL-licensed phonemizer
MetaVoice-1B	Repo	🤗 Hub	Apache 2.0	Yes	Multilingual	Not Available	🤗 Space
Neural-HMM TTS	Repo	GitHub	MIT	Yes	English	Paper	GH Pages
OpenVoice	Repo	🤗 Hub	CC-BY-NC 4.0	No	ZH + EN	Paper	🤗 Space	Non Commercial
OverFlow TTS	Repo	GitHub	MIT	Yes	English	Paper	GH Pages
Parler TTS	Repo	🤗 Hub	Apache 2.0	Yes	English	Not Available	Not Available
pflowTTS	Unofficial Repo	GDrive	MIT	Yes	English	Paper	Not Available	GPL-licensed phonemizer
Piper	Repo	🤗 Hub	MIT	Yes	Multilingual	Not Available	Not Available	GPL-licensed phonemizer
Pheme	Repo	🤗 Hub	CC-BY	Yes	English	Paper	🤗 Space
RAD-MMM	Repo	GDrive	MIT	Yes	Multilingual	Paper	Jupyter Notebook, Webpage
RAD-TTS	Repo	GDrive	MIT	Yes	English	Paper	GH Pages
Silero	Repo	GH links	CC BY-NC-SA	No	EM + DE + ES + EA	Not Available	Not Available	Non Commercial
StyleTTS 2	Repo	🤗 Hub	MIT	Yes	English	Paper	🤗 Space	GPL-licensed phonemizer
Tacotron 2	Unofficial Repo	GDrive	BSD-3	Yes	English	Paper	Webpage
TorToiSe TTS	Repo	🤗 Hub	Apache 2.0	Yes	English	Technical report	🤗 Space
TTTS	Repo	🤗 Hub	MPL 2.0	No	ZH	Not Available	Colab, 🤗 Space
VALL-E	Unofficial Repo	Not Available	MIT	Yes	NA	Paper	Not Available
VITS/ MMS-TTS	Repo	🤗 Hub / MMS	Apache 2.0	Yes	English	Paper	🤗 Space	GPL-licensed phonemizer
WhisperSpeech	Repo	🤗 Hub	MIT	No	English, Polish	Not Available	🤗 Space, Recordings, Colab
XTTS	Repo	🤗 Hub	CPML	Yes	Multilingual	Paper	🤗 Space	Non Commercial
xVASynth	Repo	🤗 Hub	GPL-3.0	Yes	Multilingual	Paper	🤗 Space	Copyrighted materials used for training.

Capability specifics

Click on this to toggle table visibility

Name	Processor ⚡	Phonetic alphabet 🔤	Insta-clone 👥	Emotional control 🎭	Prompting 📖	Speech control 🎚	Streaming support 🌊	S2S support 🦜	Longform synthesis
Amphion	CUDA		👥	🎭👥	❌
Bark	CUDA		❌	🎭 tags	❌
EmotiVoice
Glow-TTS
GPT-SoVITS
HierSpeech++		❌	👥	🎭👥	❌	speed / stability 🎚		🦜
IMS-Toucan	CUDA	❌	❌	❌	❌
MahaTTS
Matcha-TTS		IPA	❌	❌	❌	speed / stability 🎚
MetaVoice-1B	CUDA		👥	🎭👥	❌	stability / similarity 🎚			Yes
Neural-HMM TTS
OpenVoice	CUDA	❌	👥	6-type 🎭 😡😃😭😯🤫😊	❌
OverFlow TTS
pflowTTS
Piper
Pheme	CUDA	❌	👥	🎭👥	❌	stability 🎚
RAD-TTS
Silero
StyleTTS 2	CPU / CUDA	IPA	👥	🎭👥	❌		🌊		Yes
Tacotron 2
TorToiSe TTS		❌	❌	❌	📖		🌊
TTTS	CPU/CUDA	❌	👥
VALL-E
VITS/ MMS-TTS	CUDA	❌	❌	❌	❌	speed 🎚
WhisperSpeech	CUDA	❌	👥	🎭👥	❌	speed 🎚
XTTS	CUDA	❌	👥	🎭👥	❌	speed / stability 🎚	🌊	❌
xVASynth	CPU / CUDA	ARPAbet+	❌	4-type 🎭 😡😃😭😯 per‑phoneme	❌	speed / pitch / energy / 🎭 🎚 per‑phoneme	❌	🦜

Processor - CPU/CUDA/ROCm (single/multi used for inference; Real-time factor should be below 2.0 to qualify for CPU, though some leeway can be given if it supports audio streaming)
Phonetic alphabet - None/IPA/ARPAbet (Phonetic transcription that allows to control pronunciation of certain words during inference)
Insta-clone - Yes/No (Zero-shot model for quick voice clone)
Emotional control - Yes🎭/Strict (Strict, as in has no ability to go in-between states, insta-clone switch/🎭👥)
Prompting - Yes/No (A side effect of narrator based datasets and a way to affect the emotional state, ElevenLabs docs)
Streaming support - Yes/No (If it is possible to playback audio that is still being generated)
Speech control - speed/pitch/ (Ability to change the pitch, duration, energy and/or emotion of generated speech)
Speech-To-Speech support - Yes/No (Streaming support implies real-time S2S; S2T=>T2S does not count)

How can you help?

Help make this list more complete. Create demos on the Hugging Face Hub and link them here :) Got any questions? Drop me a DM on Twitter @reach_vb.

Vaibhavs10/open-tts-tracker

🗣️ Open TTS Tracker

Capability specifics

How can you help?