/Sound-VECaps

This is the repo for Sound-VECaps

Sound-VECaps

arXiv githubio DOI

This is the repo for Sound-VECaps, a large-scale audio-caption dataset.

For more details, please refer to the paper IMPROVING AUDIO GENERATION WITH VISUAL ENHANCED CAPTION


Data downloading

  • Sound-VECaps(1.66M)

    • We provide two versions of Sound-VECaps:
      • A full version including visual and audio information (Sound-VECaps_full.csv).
      • A version that excludes all the visual-only contents (Sound-VECaps_audio.csv).
  • AudioCaps-Enhanced (4430)

    • We also provide the enhanced caption of AudioCaps testing set, consisting of 886 pairs of audio captions, each audio has 5 captions (4430 captions in total):
      • A full version including visual and audio information (AudioCaps_Enhanced_full.csv).
      • A version that excludes all the visual-only contents (AudioCaps_Enhanced_audio.csv).

All the datasets can be downloaded from Zenodo

Audio generation system

Coming soon

Audio retrieval system

Coming soon