Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Official code for "Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation"

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation,
Yuanhong Chen*, Yuyuan Liu*, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro.
CVPR 2024 (arXiv 2304.02970)

Dataset

VPO datasets are available here

VGGSound audio files are available here

*Visual comparison between datasets. We show four audio-visual classes, including “female”, “cat”, “dog”, and “car”. The AVSBench (SS) (1st frame) provides pixel-level multi-class annotations to the images containing a single sounding object. The proposed VPO benchmarks (2nd frame to 4th frame) pair a subset of the segmented objects in an image with relevant audio files to produce pixel-level multi-class annotations. *

Demon

Performance

Checkpoints

Usage

Requirements

git clone git@github.com:cyh-0/CAVP.git
cd CAVP
pip install -r requirements.txt

Path

ln -s /path/to/datasets ../audio_visual
ln -s /path/to/ckpts ./ckpts

Training

Citation

@misc{chen2024unraveling,
      title={Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation}, 
      author={Yuanhong Chen and Yuyuan Liu and Hu Wang and Fengbei Liu and Chong Wang and Helen Frazer and Gustavo Carneiro},
      year={2024},
      eprint={2304.02970},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}