SzczesnyS's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
lllyasviel/ControlNet
Let us control diffusion models!
deezer/spleeter
Deezer source separation library including pretrained models.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
miss-mumu/developer2gwy
公务员从入门到上岸,最佳程序员公考实践教程
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
huggingface/parler-tts
Inference and training library for high-quality TTS models.
lixin4ever/Conference-Acceptance-Rate
Acceptance rates for the major AI conferences
riffusion/riffusion-hobby
Stable diffusion for real-time music generation
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
PlayVoice/whisper-vits-svc
Core Engine of Singing Voice Conversion & Singing Voice Clone
xunhuang1995/AdaIN-style
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
SUC-DriverOld/so-vits-svc-Deployment-Documents
So-VITS-SVC 本地部署使用帮助文档,提供Colab笔记本 So-VITS-SVC Local Deployment Document and provide Colab notebook
Tele-AI/TeleSpeech-ASR
facebookresearch/textlesslib
Library for Textless Spoken Language Processing
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
CNChTu/Diffusion-SVC
hche11/VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
jrgillick/laughter-detection
EmilianPostolache/stable-audio-controlnet
Fine-tune Stable Audio Open with DiT ControlNet.
xinchen-ai/Westlake-Omni
KunZhou9646/Mixed_Emotions
ejhumphrey/minst-dataset
Music INSTrument dataset
iiscleap/ZEST
Zero-Shot Emotion Style Transfer
zachary-shah/riff-cnet
Controlled audio inpainting using SD-fine tuned model Riffusion in a ControlNet Architecture
ilpoviertola/V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)
d3n7/riffusionPrepper
Prepare spectrograms from audio for training a Riffusion model
dhivyasreedhar/Music-Instrument-Recognition
A Convolutional Neural Network and a K nearest neighbour based classifier to detect the musical instrument present in a given audio file. It can be used for monophonic files. Both classifiers performed well with accuracy above 90%