yuan1615's Stars
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell.
xiaolai/everyone-can-use-english
人人都能用英语
bleedline/aimoneyhunter
ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English version for more insights.
fishaudio/fish-speech
Brand new TTS solution
InstantID/InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
JoeanAmier/TikTokDownloader
TikTok 主页/合辑/直播/视频/图集/原声;抖音主页/视频/图集/收藏/直播/原声/合集/评论/账号/搜索/热榜数据采集工具
Vaibhavs10/insanely-fast-whisper
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
ai-forever/Kandinsky-2
Kandinsky 2 — multilingual text2image latent diffusion model
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
csteinmetz1/ai-audio-startups
Community list of startups working with AI in audio and music technology
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
willisma/SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
daniilrobnikov/vits2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
shansongliu/M2UGen
This is the official repository for M2UGen
haoheliu/voicefixer_main
General Speech Restoration
hayeong0/Diff-HierVC
Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation"
zhenye234/CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Grace9994/CoMoSVC
CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone
thu-ml/Bridge-TTS
Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).
DavidMChan/Anim400K
Anim-400K: A dataset designed from the ground up for automated dubbing of video
0417keito/JEN-1-pytorch
Unofficial implementation JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models(https://arxiv.org/abs/2308.04729)
neonbjb/pyfastmp3decoder
A fast MP3 decoder for python, using minimp3
google/df-conformer
Audio samples accompanying publications related to DF-Conformer, a speech enhancement model.