shangqwe123's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
fishaudio/fish-speech
Brand new TTS solution
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
hahahumble/speechgpt
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
facebookresearch/svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
maum-ai/voicefilter
Unofficial PyTorch implementation of Google AI's VoiceFilter system
livekit/agents
Build real-time multimodal AI applications 🤖🎙️📹
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
princeton-nlp/SimPO
SimPO: Simple Preference Optimization with a Reference-Free Reward
microsoft/Pengi
An Audio Language model for Audio Tasks
lucidrains/autoregressive-diffusion-pytorch
Implementation of Autoregressive Diffusion in Pytorch
gudgud96/frechet-audio-distance
A lightweight library for Frechet Audio Distance calculation.
Edresson/VoiceSplit
VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram
okio-ai/nendo
The Nendo AI Audio Tool Suite
leafduo/chatgpt-telegram-bot
Telegram bot for ChatGPT
thuhcsi/SECap
BUTSpeechFIT/speakerbeam
CNChTu/FCPE
OSVAI/KernelWarehouse
The official project website of "KernelWarehouse: Rethinking the Design of Dynamic Convolution" (KW for short, accepted to ICML 2024)
bshall/hifigan
An 16kHz implementation of HiFi-GAN for soft-vc.
RickyL-2000/ROSVOT
Robust Singing Voice Transcription and MIDI Extraction
seanghay/uvr-mdx-infer
Ultimate Vocal Remover Inference CLI
deeplyinc/Nonverbal-Vocalization-Dataset
MTG/tape
TAPE: An End-to-End Timbre-Aware Pitch Estimator
wyw97/DENSE
ICASSP2025Dynamic Embedding Causal Target Speech Extraction