Official code for "Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation"
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation,
Yuanhong Chen*, Yuyuan Liu*, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro.
CVPR 2024 (arXiv 2304.02970)
VPO datasets are available here
VGGSound audio files are available here
*Visual comparison between datasets. We show four audio-visual classes, including “female”, “cat”, “dog”, and “car”. The AVSBench (SS) (1st frame) provides pixel-level multi-class annotations to the images containing a single sounding object. The proposed VPO benchmarks (2nd frame to 4th frame) pair a subset of the segmented objects in an image with relevant audio files to produce pixel-level multi-class annotations. *
git clone git@github.com:cyh-0/CAVP.git
cd CAVP
pip install -r requirements.txt
ln -s /path/to/datasets ../audio_visual
ln -s /path/to/ckpts ./ckpts
@misc{chen2024unraveling,
title={Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation},
author={Yuanhong Chen and Yuyuan Liu and Hu Wang and Fengbei Liu and Chong Wang and Helen Frazer and Gustavo Carneiro},
year={2024},
eprint={2304.02970},
archivePrefix={arXiv},
primaryClass={cs.CV}
}