voice-activity-detection

There are 184 repositories under voice-activity-detection topic.

modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python12.6k 93 1.5k1.3k
noisetorch/NoiseTorch
Real-time microphone noise suppression on Linux.
Language:Go9.8k 68 324243
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook8.3k 78 1k938
smacke/ffsubsync
Automagically synchronize subtitles with video.
Language:Python7.3k 75 169300
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python6.8k 58 292631
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
2k 41 28249
BingLingGroup/autosub
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
Language:Python2k 33 196245
ricky0123/vad
Voice activity detector (VAD) for the browser with a simple API
Language:TypeScript1.6k 15 147223
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Language:C++1.5k 34 173196
juanmc2005/diart
A python package to build AI-powered real-time audio applications
Language:Python1.5k 22 166106
TEN-framework/ten-vad
Voice Activity Detection (VAD) : low-latency, high-performance and lightweight
Language:C1.4k 20 33116
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
1.4k 56 199148
ggeop/Python-ai-assistant
Python AI assistant 🧠
Language:Python991 44 55247
jtkim-kaist/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
Language:MATLAB863 45 40235
ina-foss/inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Language:Python836 23 76141
amsehili/auditok
An audio/acoustic activity detection and audio segmentation tool
Language:Python803 25 3798
iamsrikanthnani/pluely
The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.
Language:TypeScript70092
FluidInference/FluidAudio
Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.
Language:Swift626
baxtree/subaligner
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
Language:Python484 15 3720
shashikg/WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Language:Jupyter Notebook468 18 7161
gkonovalov/android-vad
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Language:C408 9 3069
gtreshchev/RuntimeAudioImporter
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
Language:C++393 9 7483
jim-schwoebel/voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Language:Python386 25 2587
filippogiruzzi/voice_activity_detection
Voice Activity Detection based on Deep Learning & TensorFlow
Language:Python369 13 1568
Picovoice/cobra
On-device voice activity detection (VAD) powered by deep learning
Language:Python228 11 2614
tomchang25/whisper-auto-transcribe
Auto transcribe tool based on whisper
Language:Python227 5 4816
nicklashansen/voice-activity-detection
Voice Activity Detection (VAD) using deep learning.
Language:Jupyter Notebook199 4 332
eesungkim/Voice_Activity_Detector
A statistical model-based Voice Activity Detection
Language:Jupyter Notebook192 6 840
pmbstyle/Alice
Alice is a smart desktop AI assistant application built with Vue.js, Vite, and Electron. Advanced memory system, function calling, MCP support, optional fully local use, and more.
Language:TypeScript1690
voithru/voice-activity-detection
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021
Language:Python157 4 627
zhenghuatan/rVADfast
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
Language:Python146 8 224
RicherMans/GPV
Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper
Language:Python142 4 929
zhenghuatan/rVAD
Matlab and Python libraries for an unsupervised method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
Language:MATLAB137 7 730
Speech-Interaction-Technology-Aalto-U/itsp
Introduction to Speech Processing
Language:Jupyter Notebook104 3 715
Ankit-Kumar-Saini/Coursera_Deep_Learning_Specialization
Implementation of Logistic Regression, MLP, CNN, RNN & LSTM from scratch in python. Training of deep learning models for image classification, object detection, and sequence processing (including transformers implementation) in TensorFlow.
Language:Jupyter Notebook95 2 559
RicherMans/Datadriven-GPVAD
The codebase for Data-driven general-purpose voice activity detection.
Language:Python94 7 1623

voice-activity-detection

modelscope/FunASR

noisetorch/NoiseTorch

pyannote/pyannote-audio

smacke/ffsubsync

snakers4/silero-vad

jim-schwoebel/voice_datasets

BingLingGroup/autosub

ricky0123/vad

k2-fsa/sherpa-ncnn

juanmc2005/diart

TEN-framework/ten-vad

coqui-ai/open-speech-corpora

ggeop/Python-ai-assistant

jtkim-kaist/VAD

ina-foss/inaSpeechSegmenter

amsehili/auditok

iamsrikanthnani/pluely

FluidInference/FluidAudio

baxtree/subaligner

shashikg/WhisperS2T

gkonovalov/android-vad

gtreshchev/RuntimeAudioImporter

jim-schwoebel/voicebook

filippogiruzzi/voice_activity_detection

Picovoice/cobra

tomchang25/whisper-auto-transcribe

nicklashansen/voice-activity-detection

eesungkim/Voice_Activity_Detector

pmbstyle/Alice

voithru/voice-activity-detection

zhenghuatan/rVADfast

RicherMans/GPV

zhenghuatan/rVAD

Speech-Interaction-Technology-Aalto-U/itsp

Ankit-Kumar-Saini/Coursera_Deep_Learning_Specialization

RicherMans/Datadriven-GPVAD