question about the metadata

Question

question about the metadata

holehole5566 opened this issue a year ago · 2 comments

Hi there! Firstly, thank you for providing the code for training AudioLDM.

I have a question regarding the AudioCaps dataset. In the *_label.json files, each data entry contains a "seg_label" key. The README mentions that pre-segmentation of audio files isn't necessary, but I'm curious about the purpose of this "seg_label" key.

Could you clarify whether the "seg_label" field is simply a path for saving preprocessed .npy files during training, or does it contain preprocessed data that requires specific steps before use? If the latter is true, could you guide me on how to process the WAV files into .npy format?

Thank you very much for your help!

Answer 1 · 2023-11-24T15:00:58.000Z

@holehole5566 Sorry for the confusion and thank you for bring that up. The "seg_label" key is not used in the code so please ignore that key. I'll update the dataset tar file to remove this key in the future.

Answer 2 · 2023-11-25T07:28:35.000Z

OK! thanks for your replying