WangGewu's Stars
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
ScottishFold007/TTSAudioNormalizer
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
fishaudio/vocoder
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
lucadellalib/discrete-wavlm-codec
A neural speech codec based on discrete WavLM representations
francislata/unicats
An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".
Jackiexiao/tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
Aria-K-Alethia/BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion, with real-time support
opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Hoper-J/AI-Guide-and-Demos-zh_CN
这是一份入门AI/LLM大模型的逐步指南,包含教程和演示代码,带你从API走进本地大模型部署和微调,代码文件会提供Kaggle或Colab在线版本,即便没有显卡也可以进行学习。项目中还开设了一个小型的代码游乐场🎡,你可以尝试在里面实验一些有意思的AI脚本。同时,包含李宏毅 (HUNG-YI LEE)2024生成式人工智能导论课程的完整中文镜像作业。
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
yoosif0/arabic-tacotron-tts
End to end Arabic TTS system based on tacotron
TaoRuijie/ECAPA-TDNN
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
BytedanceSpeech/seed-tts-eval
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
vivian556123/NeurIPS2024-CoVoMix
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
XinhaoMei/WavCaps
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
Rikorose/DeepFilterNet
Noise supression using deep filtering
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
kyutai-labs/moshi
thuhcsi/SECap