voice-activity-detection

There are 184 repositories under voice-activity-detection topic.

  • modelscope/FunASR

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    Language:Python12.6k931.5k1.3k
  • NoiseTorch

    noisetorch/NoiseTorch

    Real-time microphone noise suppression on Linux.

    Language:Go9.8k68324243
  • pyannote/pyannote-audio

    Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

    Language:Jupyter Notebook8.3k781k938
  • smacke/ffsubsync

    Automagically synchronize subtitles with video.

    Language:Python7.3k75169300
  • snakers4/silero-vad

    Silero VAD: pre-trained enterprise-grade Voice Activity Detector

    Language:Python6.8k58292631
  • jim-schwoebel/voice_datasets

    🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

  • autosub

    BingLingGroup/autosub

    Command-line utility to transcribe/translate from video/audio/subtitles to subtitles

    Language:Python2k33196245
  • ricky0123/vad

    Voice activity detector (VAD) for the browser with a simple API

    Language:TypeScript1.6k15147223
  • k2-fsa/sherpa-ncnn

    Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

    Language:C++1.5k34173196
  • diart

    juanmc2005/diart

    A python package to build AI-powered real-time audio applications

    Language:Python1.5k22166106
  • TEN-framework/ten-vad

    Voice Activity Detection (VAD) : low-latency, high-performance and lightweight

    Language:C1.4k2033116
  • coqui-ai/open-speech-corpora

    💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

  • Python-ai-assistant

    ggeop/Python-ai-assistant

    Python AI assistant 🧠

    Language:Python9914455247
  • jtkim-kaist/VAD

    Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

    Language:MATLAB8634540235
  • ina-foss/inaSpeechSegmenter

    CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

    Language:Python8362376141
  • amsehili/auditok

    An audio/acoustic activity detection and audio segmentation tool

    Language:Python803253798
  • iamsrikanthnani/pluely

    The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.

    Language:TypeScript70092
  • FluidAudio

    FluidInference/FluidAudio

    Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

    Language:Swift626
  • baxtree/subaligner

    Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

    Language:Python484153720
  • shashikg/WhisperS2T

    An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

    Language:Jupyter Notebook468187161
  • gkonovalov/android-vad

    Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

    Language:C40893069
  • RuntimeAudioImporter

    gtreshchev/RuntimeAudioImporter

    Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

    Language:C++39397483
  • jim-schwoebel/voicebook

    🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

    Language:Python386252587
  • filippogiruzzi/voice_activity_detection

    Voice Activity Detection based on Deep Learning & TensorFlow

    Language:Python369131568
  • Picovoice/cobra

    On-device voice activity detection (VAD) powered by deep learning

    Language:Python228112614
  • tomchang25/whisper-auto-transcribe

    Auto transcribe tool based on whisper

    Language:Python22754816
  • nicklashansen/voice-activity-detection

    Voice Activity Detection (VAD) using deep learning.

    Language:Jupyter Notebook1994332
  • eesungkim/Voice_Activity_Detector

    A statistical model-based Voice Activity Detection

    Language:Jupyter Notebook1926840
  • pmbstyle/Alice

    Alice is a smart desktop AI assistant application built with Vue.js, Vite, and Electron. Advanced memory system, function calling, MCP support, optional fully local use, and more.

    Language:TypeScript1690
  • voithru/voice-activity-detection

    Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

    Language:Python1574627
  • zhenghuatan/rVADfast

    This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

    Language:Python1468224
  • RicherMans/GPV

    Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

    Language:Python1424929
  • zhenghuatan/rVAD

    Matlab and Python libraries for an unsupervised method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

    Language:MATLAB1377730
  • itsp

    Speech-Interaction-Technology-Aalto-U/itsp

    Introduction to Speech Processing

    Language:Jupyter Notebook1043715
  • Ankit-Kumar-Saini/Coursera_Deep_Learning_Specialization

    Implementation of Logistic Regression, MLP, CNN, RNN & LSTM from scratch in python. Training of deep learning models for image classification, object detection, and sequence processing (including transformers implementation) in TensorFlow.

    Language:Jupyter Notebook952559
  • RicherMans/Datadriven-GPVAD

    The codebase for Data-driven general-purpose voice activity detection.

    Language:Python9471623