Georgehappy1's Stars
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
huggingface/parler-tts
Inference and training library for high-quality TTS models.
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
lucidrains/self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Vaibhavs10/open-tts-tracker
k2-fsa/icefall
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Rongjiehuang/GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
huggingface/dataspeech
dubverse-ai/MahaTTS
PolyAI-LDN/pheme
fishaudio/audio-preprocess
Preprocess Audio for training
CODEJIN/NaturalSpeech2
neonbjb/DL-Art-School
DLAS - A configuration-driven trainer for generative models
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
AudiogenAI/agc
Audiogen Codec
uniaudio666/UniAudio
The official source code of UniAudio
NVIDIA/RAD-MMM
A TTS model that makes a speaker speak new languages
ex3ndr/supervoice-gpt
GPT-style network for phonemization with durations of text
scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
innnky/descript-audio-vae
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
jaehyeongAN/RedisAI-demo