shangqwe123

shangqwe123's Stars

2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python30.6k 173 5033.3k
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language:Python10k 154 59787
fishaudio/fish-speech
Brand new TTS solution
Language:Python8.7k 62 333675
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python4.7k 50 341473
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python4.4k 58 151379
hahahumble/speechgpt
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Language:TypeScript2.7k 20 47404
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Language:Python2k 49 126320
facebookresearch/svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Language:Python1.2k 25 94176
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.2k 45 4280
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.2k 21 5346
maum-ai/voicefilter
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Language:Python1.1k 35 26227
livekit/agents
Build real-time multimodal AI applications 🤖🎙️📹
Language:Python1k 27 123198
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Language:Python745 14 3139
princeton-nlp/SimPO
SimPO: Simple Preference Optimization with a Reference-Free Reward
Language:Python636 8 6337
microsoft/Pengi
An Audio Language model for Audio Tasks
Language:Python281 14 1315
lucidrains/autoregressive-diffusion-pytorch
Implementation of Autoregressive Diffusion in Pytorch
Language:Python243 12 33
gudgud96/frechet-audio-distance
A lightweight library for Frechet Audio Distance calculation.
Language:Python229 2 1323
Edresson/VoiceSplit
VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram
Language:Python214 7 1133
okio-ai/nendo
The Nendo AI Audio Tool Suite
Language:Python207 7 812
leafduo/chatgpt-telegram-bot
Telegram bot for ChatGPT
Language:Go165 3 726
thuhcsi/SECap
Language:Python128 3 811
BUTSpeechFIT/speakerbeam
Language:Jupyter Notebook93 6 318
CNChTu/FCPE
Language:Python91 5 518
OSVAI/KernelWarehouse
The official project website of "KernelWarehouse: Rethinking the Design of Dynamic Convolution" (KW for short, accepted to ICML 2024)
Language:Python88 4 34
bshall/hifigan
An 16kHz implementation of HiFi-GAN for soft-vc.
Language:Python85 5 623
RickyL-2000/ROSVOT
Robust Singing Voice Transcription and MIDI Extraction
Language:Python46 1 31
seanghay/uvr-mdx-infer
Ultimate Vocal Remover Inference CLI
Language:Python39 1 66
deeplyinc/Nonverbal-Vocalization-Dataset
Language:Jupyter Notebook264
MTG/tape
TAPE: An End-to-End Timbre-Aware Pitch Estimator
Language:Jupyter Notebook19 6 10
wyw97/DENSE
ICASSP2025Dynamic Embedding Causal Target Speech Extraction
Language:Python61