audio-visual-learning
There are 42 repositories under audio-visual-learning topic.
ali-vilab/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
tanshuai0219/EDTalk
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
OpenNLPLab/AVSBench
[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)
xid32/NAACL_2025_TWM
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
YapengTian/AVE-ECCV18
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
alvinliu0/HA2G
[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"
rhgao/co-separation
Co-Separating Sounds of Visual Objects (ICCV 2019)
yanbeic/CCL
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
praveena2j/JointCrossAttentional-AV-Fusion
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
jasongief/PSP_CVPR_2021
[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line
praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
MengyuanChen21/CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
stoneMo/AVGN
Official implementation for AVGN
stoneMo/DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
stoneMo/EZ-VSL
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
kyuyeonpooh/objects-that-sound
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
jasongief/CPSP
[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
praveena2j/Cross-Attentional-AV-Fusion
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
ttgeng233/UniAV
Unified Audio-Visual Perception for Multi-Task Video Localization
stoneMo/MGN
Official implementation for MGN
stoneMo/SLAVC
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
dkurzend/ClipClap-GZSL
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
stoneMo/CIGN
Official implementation for CIGN
Tinglok/avstyle
Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)
AmandineBtto/Batvision-Dataset
A large-scale real-world audio-visual dataset for research on 3D scene understanding and echolocation.
praveena2j/RecurrentJointAttentionwithLSTMs
ICASSP 2023: "Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition"
stoneMo/OneAVM
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
aromanusc/SoundQ
Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)
JaesungHuh/look-listen-recognise
Dataset page for Look, Listen and Recognise : character-aware audio-visual subtitling (ICASSP 2024)
ly-zhu/Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation
PyTorch implementation of "Leveraging Category Information for Single-Frame Visual Sound Source Separation"
stoneMo/EZ-AVGZL
Official Codebase of "Audio-visual Generalized Zero-shot Learning the Easy Way" (ECCV 2024)
ly-zhu/cof-net
Code for the paper: Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
praveena2j/RJCAforSpeakerVerification
[FG 2024] "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention"
Huntersxsx/AVVP-Learning-List
Related papers about Weakly-supervised Audio-Visual Video Parsing (AVVP) & Audio-Visual Event Localization (AVE)