audio-visual-learning

There are 42 repositories under audio-visual-learning topic.

ali-vilab/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Language:Python1.8k 32 49215
tanshuai0219/EDTalk
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
Language:Python434 14 4637
OpenNLPLab/AVSBench
[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)
Language:Python402 14 2936
xid32/NAACL_2025_TWM
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
Language:Python309 21 030
YapengTian/AVE-ECCV18
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
Language:Python191 5 3532
alvinliu0/HA2G
[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"
Language:Python143 4 229
rhgao/co-separation
Co-Separating Sounds of Visual Objects (ICCV 2019)
Language:Python97 3 1622
yanbeic/CCL
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Language:Python89 5 811
ttgeng233/UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Language:Python67 3 146
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
Language:Python56 2 44
praveena2j/JointCrossAttentional-AV-Fusion
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Language:Python43 2 59
jasongief/PSP_CVPR_2021
[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line
Language:Python41 3 1312
praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
Language:Python38 1 511
MengyuanChen21/CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Language:Python35 1 03
stoneMo/AVGN
Official implementation for AVGN
Language:Python34 1 53
stoneMo/DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
Language:Python33 6 71
stoneMo/EZ-VSL
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
Language:Python33 2 69
kyuyeonpooh/objects-that-sound
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Language:Python32 4 34
jasongief/CPSP
[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Language:Python29 2 45
praveena2j/Cross-Attentional-AV-Fusion
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
Language:Python28 1 25
ttgeng233/UniAV
Unified Audio-Visual Perception for Multi-Task Video Localization
Language:Python24 3 31
stoneMo/MGN
Official implementation for MGN
Language:Python20 1 10
stoneMo/SLAVC
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
Language:Python20 3 66
dkurzend/ClipClap-GZSL
Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
Language:Python19 1 24
stoneMo/CIGN
Official implementation for CIGN
Language:Python16 1 11
Tinglok/avstyle
Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)
Language:Python15 2 42
AmandineBtto/Batvision-Dataset
A large-scale real-world audio-visual dataset for research on 3D scene understanding and echolocation.
Language:Python13 1 31
praveena2j/RecurrentJointAttentionwithLSTMs
ICASSP 2023: "Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition"
Language:Python12 1 10
stoneMo/OneAVM
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
12 5 11
aromanusc/SoundQ
Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)
Language:Python11 2 22
JaesungHuh/look-listen-recognise
Dataset page for Look, Listen and Recognise : character-aware audio-visual subtitling (ICASSP 2024)
Language:Python7 1 00
ly-zhu/Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation
PyTorch implementation of "Leveraging Category Information for Single-Frame Visual Sound Source Separation"
Language:Python6 1 22
stoneMo/EZ-AVGZL
Official Codebase of "Audio-visual Generalized Zero-shot Learning the Easy Way" (ECCV 2024)
5 2 1
ly-zhu/cof-net
Code for the paper: Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Language:Python4 1 10
praveena2j/RJCAforSpeakerVerification
[FG 2024] "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention"
Language:Python4 1 00
Huntersxsx/AVVP-Learning-List
Related papers about Weakly-supervised Audio-Visual Video Parsing (AVVP) & Audio-Visual Event Localization (AVE)
3 1 00

audio-visual-learning

ali-vilab/dreamtalk

tanshuai0219/EDTalk

OpenNLPLab/AVSBench

xid32/NAACL_2025_TWM

YapengTian/AVE-ECCV18

alvinliu0/HA2G

rhgao/co-separation

yanbeic/CCL

ttgeng233/UnAV

roger-tseng/av-superb

praveena2j/JointCrossAttentional-AV-Fusion

jasongief/PSP_CVPR_2021

praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion

MengyuanChen21/CVPR2023-CMPAE

stoneMo/AVGN

stoneMo/DeepAVFusion

stoneMo/EZ-VSL

kyuyeonpooh/objects-that-sound

jasongief/CPSP

praveena2j/Cross-Attentional-AV-Fusion

ttgeng233/UniAV

stoneMo/MGN

stoneMo/SLAVC

dkurzend/ClipClap-GZSL

stoneMo/CIGN

Tinglok/avstyle

AmandineBtto/Batvision-Dataset

praveena2j/RecurrentJointAttentionwithLSTMs

stoneMo/OneAVM

aromanusc/SoundQ

JaesungHuh/look-listen-recognise

ly-zhu/Leveraging-Category-Information-for-Single-Frame-Visual-Sound-Source-Separation

stoneMo/EZ-AVGZL

ly-zhu/cof-net

praveena2j/RJCAforSpeakerVerification

Huntersxsx/AVVP-Learning-List