bokesyo's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
2noise/ChatTTS
A generative speech model for daily dialogue.
fishaudio/fish-speech
Brand new TTS solution
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
huggingface/parler-tts
Inference and training library for high-quality TTS models.
philz1337x/clarity-upscaler
Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
lucidrains/audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
lizhe2004/Awesome-LLM-RAG-Application
the resources about the application based on LLM with RAG pattern
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
lenML/ChatTTS-Forge
🍦 ChatTTS-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
yangdongchao/UniAudio
The Open Source Code of UniAudio
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
haoheliu/SemantiCodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
fumiama/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
MagicHub-io/MagicData-RAMC
MagicData-RAMC Dataset and Baseline
RicherMans/Dasheng
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
RhapsodyAILab/MiniCPM-V-Embedding
RhapsodyAILab/Awesome-MiniCPMV-Projects
fncokg/UntPlot
UntPhesoca调值格局可视化图的Python绘图包