0nutation's Stars
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Anjok07/ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
ml-explore/mlx
MLX: An array framework for Apple silicon
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
jy0205/Pyramid-Flow
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
usefulsensors/moonshine
Fast and accurate automatic speech recognition (ASR) for edge devices
openai/openai-realtime-console
React app for inspecting, building and debugging with the Realtime API
alaskasquirrel/Chinese-Podcasts
播客 🎧 编程、设计、Vlog、音乐、访谈、博客...
homebrewltd/ichigo
Local realtime voice AI
baaivision/Emu3
Next-Token Prediction is All You Need
qiuqiangkong/audioset_tagging_cnn
hendrycks/test
Measuring Massive Multitask Language Understanding | ICLR 2021
haoheliu/voicefixer
General Speech Restoration
SmartFlowAI/EmoLLM
心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
JusperLee/Speech-Separation-Paper-Tutorial
A must-read paper for speech separation based on neural networks
lhl/voicechat2
Local SRT/LLM/TTS Voicechat
facebookresearch/voxpopuli
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
yeyupiaoling/AudioClassification-Pytorch
The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.
xinchen-ai/Westlake-Omni
wenet-e2e/wesep
Target Speaker Extraction Toolkit
alibabasglab/MossFormer2
This is the audio sample repository for speech separation model "MossFormer2".
mlcommons/peoples-speech
The People’s Speech Dataset
thuhcsi/SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
LAION-AI/emotional-speech-annotations
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models