0nutation

Fudan UniversityShanghai, China

0nutation's Stars

Vaibhavs10/insanely-fast-whisper
Language:Jupyter Notebook8.3k 69 201594
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Language:Python8k 91 115657
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook7.2k 78 1k862
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language:Python6.1k 35 579602
linkedin/Liger-Kernel
Efficient Triton Kernels for LLM Training
Language:Python4.8k 46 185297
Rikorose/DeepFilterNet
Noise supression using deep filtering
Language:Python3k 34 301271
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.9k 31 58194
ufal/whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
Language:Python2.7k 40 127328
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language:Python2.3k 34 161179
baaivision/Emu3
Next-Token Prediction is All You Need
Language:Python2.1k 31 6479
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Language:Jupyter Notebook1.6k 47 256345
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Language:Python1.2k 47 151421
haoheliu/voicefixer
General Speech Restoration
Language:Python1.1k 17 59133
wenet-e2e/wespeaker
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Language:Python863 18 146133
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Language:Python830 10 1433
JusperLee/Speech-Separation-Paper-Tutorial
A must-read paper for speech separation based on neural networks
773 25 2137
facebookresearch/libri-light
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Language:Python492 18 1679
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Language:Python426 8 5241
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Language:Python243 16 1423
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
Language:Python220 15 519
aoifemcdonagh/audioset-processing
Toolkit for downloading and processing Google's AudioSet dataset.
Language:Jupyter Notebook168 3 642
shinjiwlab/versa
Versatile Evaluation of Speech and Audio
Language:Python164 9 414
Aria-K-Alethia/BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
Language:Python151 5 169
SpeechColab/GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Language:Python151 6 126
YuanGongND/vocalsound
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
Language:Jupyter Notebook133 2 710
hhguo/SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
Language:Python79 8 85
AbrahamSanders/codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
Language:Python54 4 27
slp-rl/salmon
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
Language:Python45 1 00
tincans-ai/gazelle-inference
proof of concept conversation orchestrator with a speech-language model
Language:Go19 2 01
hedeshy/CNVVE
Dataset and Benchmark for Classifying Non-verbal Voice Expressions
Language:Python9 3 00

0nutation

0nutation's Stars

Vaibhavs10/insanely-fast-whisper

kyutai-labs/moshi

pyannote/pyannote-audio

OpenRLHF/OpenRLHF

linkedin/Liger-Kernel

Rikorose/DeepFilterNet

ictnlp/LLaMA-Omni

ufal/whisper_streaming

linto-ai/whisper-timestamped

baaivision/Emu3

kan-bayashi/ParallelWaveGAN

microsoft/DNS-Challenge

haoheliu/voicefixer

wenet-e2e/wespeaker

lifeiteng/OmniSenseVoice

JusperLee/Speech-Separation-Paper-Tutorial

facebookresearch/libri-light

showlab/videollm-online

davidmartinrius/speech-dataset-generator

yangdongchao/RSTnet

aoifemcdonagh/audioset-processing

shinjiwlab/versa

Aria-K-Alethia/BigCodec

SpeechColab/GigaSpeech2

YuanGongND/vocalsound

hhguo/SoCodec

AbrahamSanders/codec-bpe

slp-rl/salmon

tincans-ai/gazelle-inference

hedeshy/CNVVE