Colinsnow1

Colinsnow1's Stars

RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python35.6k 210 1.3k4k
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python35.4k 293 1.1k4.3k
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python30.5k 425 4.2k6.4k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21k 206 3852.1k
harry0703/MoneyPrinterTurbo
利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.
Language:Python18k 145 3952.8k
fishaudio/fish-speech
Brand new TTS solution
Language:Python14.4k 97 4071.1k
KwaiVGI/LivePortrait
Bring portraits to life!
Language:Python12.9k 114 3711.4k
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Language:Python8k 50 01.1k
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Language:Python7.7k 82 153761
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Language:Jupyter Notebook7.6k 89 128746
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Jupyter Notebook7.6k 78 189562
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook6.3k 71 993780
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python6.1k 66 425415
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python4.9k 77 197416
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Language:Python4.8k 40 183619
fudan-generative-vision/champ
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Language:Python4.7k 313 124598
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Language:Python4.3k 53 130283
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Language:Python3.9k 79 127658
Kwai-Kolors/Kolors
Kolors Team
Language:Python3.8k 39 134268
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
Language:Python3k 87 98419
KevinWang676/Bark-Voice-Cloning
Bark Voice Cloning and Voice Cloning for Chinese Speech
Language:Jupyter Notebook2.8k 32 100400
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
Language:Jupyter Notebook2.5k 32 48206
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Language:Python2k 49 126319
rsxdalv/tts-generation-webui
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
Language:TypeScript1.8k 34 234197
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Language:Python636 15 4446
cronrpc/SubFix
SubFix: Efficient Web-Based Audio Subtitle Editing and Multilingual Automatic Annotation Tool.
Language:Python192 4 1621
Zuntan03/EasyBertVits2
文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。
Language:Batchfile137 4 413
nii-yamagishilab/ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Language:C121 5 69
0913ktg/vits_korean_multispeaker
Language:Python91
tan-xu/paper-reading
深度学习经典、新论文逐段精读
5 0 00