Sound-VECaps

This is the repo for Sound-VECaps, a large-scale audio-caption dataset.

Data downloading

Sound-VECaps(1.66M)
- We provide two versions of Sound-VECaps:
  - A full version including visual and audio information (Sound-VECaps_full.csv).
  - A version that excludes all the visual-only contents (Sound-VECaps_audio.csv).
AudioCaps-Enhanced (4430)
- We also provide the enhanced caption of AudioCaps testing set, consisting of 886 pairs of audio captions, each audio has 5 captions (4430 captions in total):
  - A full version including visual and audio information (AudioCaps_Enhanced_full.csv).
  - A version that excludes all the visual-only contents (AudioCaps_Enhanced_audio.csv).

All the datasets can be downloaded from Zenodo

Coming soon

Coming soon