audio-visual
There are 77 repositories under audio-visual topic.
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
TaoRuijie/TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
HumanAIGC/omnitalker
[NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
samhirtarif/react-audio-visualize
An audio visualizer for React. Provides separate components to visualize both live audio and audio blobs.
guyyariv/TempoTokens
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
jerosoler/waveform-path
🎙 Generator waveform paths for SVG 🎶
ekazakos/temporal-binding-network
Implementation of "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, ICCV, 2019" in PyTorch
ankurbhatia24/MULTIMODAL-EMOTION-RECOGNITION
Human Emotion Understanding using multimodal dataset.
Libvisual/libvisual
Libvisual Audio Visualization
v-iashin/Synchformer
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
heypoom/patchies
Patchies is a creative coding tool for making audio-visual patches in your browser.
v-iashin/SparseSync
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
satelllte/remotion-audio-visualizer
Programmatic minimalistic audio visualizations.
MengyuanChen21/CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
joannahong/AV-RelScore
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
ruohaoguo/ovavss
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
jinxiang-liu/anno-free-AVS
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
dialogtekgeek/AVSD-DSTC10_Official
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
MCG-NJU/JoMoLD
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
georgesterpu/Taris
Transformer-based online speech recognition system with TensorFlow 2
Yu-Wu/Modaily-Aware-Audio-Visual-Video-Parsing
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
JaesungHuh/av-diarization
Audio-visual diarization pipeline used for creating VoxConverse dataset
dkurzend/ClipClap-GZSL
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
hmartelb/avlit
Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model" (AVLIT)
WikiChao/DAVIS
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound Separation from Diverse Categories"
FannyChao/AVS360_audiovisual_saliency_360
Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
Overcautious/ADENet
Accepted by TMM 2022
Sreyan88/LipGER
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
yzyouzhang/Awesome-Multimedia-Deepfake-Detection
Materials for "Multimedia Deepfake Detection" Tutorial @ ICME 2024
SAGNIKMJR/move2hear-active-AV-separation
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
cogmhear/Intelligibility-Oriented-Audio-Visual-Speech-Enhancement
Towards Intelligibility-Oriented Audio-Visual Speech Enhancement
OpenGVLab/perception_test_iccv2023
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
tutaru99/Internet-Radio-Player-Vue
Internet Radio Player with an Audio Visualizer made using VueJS, Vuetify & Howler.JS frameworks. The Player has a bunch of radio stations. Check out the demo below.
jasongief/TGS-Agent
[2025 Arxiv] Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
ruohaoguo/pavsodr
Official Implementation of "Instance-Level Panoramic Audio-Visual Saliency Detection and Ranking" [ACM MM 2024].