0nutation's Stars
Vaibhavs10/insanely-fast-whisper
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Rikorose/DeepFilterNet
Noise supression using deep filtering
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
ufal/whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
baaivision/Emu3
Next-Token Prediction is All You Need
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
haoheliu/voicefixer
General Speech Restoration
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
JusperLee/Speech-Separation-Paper-Tutorial
A must-read paper for speech separation based on neural networks
facebookresearch/libri-light
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
aoifemcdonagh/audioset-processing
Toolkit for downloading and processing Google's AudioSet dataset.
shinjiwlab/versa
Versatile Evaluation of Speech and Audio
Aria-K-Alethia/BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
SpeechColab/GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
YuanGongND/vocalsound
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
hhguo/SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
AbrahamSanders/codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
slp-rl/salmon
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
tincans-ai/gazelle-inference
proof of concept conversation orchestrator with a speech-language model
hedeshy/CNVVE
Dataset and Benchmark for Classifying Non-verbal Voice Expressions