wen0320's Stars
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
VITA-MLLM/Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
haidog-yaqub/EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Audio-AGI/WavJourney
WavJourney: Compositional Audio Creation with LLMs
Bai-YT/ConsistencyTTA
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
ivcylc/qa-mdt
OpenMusic: SOTA Text-to-music (TTM) Generation
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
2noise/ChatTTS
A generative speech model for daily dialogue.
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
yeyupiaoling/SpeechEmotionRecognition-Pytorch
基于Pytorch实现的语音情感识别
xinchen-ai/Westlake-Omni
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
LetterLiGo/SafeEar
SafeEar: Content Privacy-Preserving Audio Deepfake Detection (Accepted by CCS 2024)
CARNIVAL-IITP/Packet_loss_concealment
Crystalsound/FRN
breizhn/tPLCnet
This repository contains the trained models and some audio samples for the tPLCnet.
xiph/LPCNet
Efficient neural speech synthesis