question about the metadata
holehole5566 opened this issue · 2 comments
Hi there! Firstly, thank you for providing the code for training AudioLDM.
I have a question regarding the AudioCaps dataset. In the *_label.json files, each data entry contains a "seg_label" key. The README mentions that pre-segmentation of audio files isn't necessary, but I'm curious about the purpose of this "seg_label" key.
Could you clarify whether the "seg_label" field is simply a path for saving preprocessed .npy files during training, or does it contain preprocessed data that requires specific steps before use? If the latter is true, could you guide me on how to process the WAV files into .npy format?
Thank you very much for your help!
@holehole5566 Sorry for the confusion and thank you for bring that up. The "seg_label" key is not used in the code so please ignore that key. I'll update the dataset tar file to remove this key in the future.
OK! thanks for your replying