Colinsnow1's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
harry0703/MoneyPrinterTurbo
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
fishaudio/fish-speech
Brand new TTS solution
KwaiVGI/LivePortrait
Bring portraits to life!
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
fudan-generative-vision/champ
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Kwai-Kolors/Kolors
Kolors Team
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
KevinWang676/Bark-Voice-Cloning
Bark Voice Cloning and Voice Cloning for Chinese Speech
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
rsxdalv/tts-generation-webui
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
cronrpc/SubFix
SubFix: Efficient Web-Based Audio Subtitle Editing and Multilingual Automatic Annotation Tool.
Zuntan03/EasyBertVits2
文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。
nii-yamagishilab/ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
0913ktg/vits_korean_multispeaker
tan-xu/paper-reading
深度学习经典、新论文逐段精读