This is the repo for Sound-VECaps, a large-scale audio-caption dataset.
For more details, please refer to the paper IMPROVING AUDIO GENERATION WITH VISUAL ENHANCED CAPTION
-
Sound-VECaps(1.66M)
- We provide two versions of Sound-VECaps:
- A full version including visual and audio information (Sound-VECaps_full.csv).
- A version that excludes all the visual-only contents (Sound-VECaps_audio.csv).
- We provide two versions of Sound-VECaps:
-
AudioCaps-Enhanced (4430)
- We also provide the enhanced caption of AudioCaps testing set, consisting of 886 pairs of audio captions, each audio has 5 captions (4430 captions in total):
- A full version including visual and audio information (AudioCaps_Enhanced_full.csv).
- A version that excludes all the visual-only contents (AudioCaps_Enhanced_audio.csv).
- We also provide the enhanced caption of AudioCaps testing set, consisting of 886 pairs of audio captions, each audio has 5 captions (4430 captions in total):
All the datasets can be downloaded from Zenodo
Coming soon
Coming soon