Train / Validation / Test splits for million song dataset

Question

Train / Validation / Test splits for million song dataset

codyhesse opened this issue 2 years ago · 9 comments

Hi!
Thank you for releasing this repo :)

I was wondering where I can find the train/test/validation splits you used for MSD? My team and I are trying to reproduce this study but, unfortunately, we can't find the 201 680 / 11 774 / 28 435 splits and the corresponding tags from Last.FM. Would be very helpful for any assistance on this!

Kind regards,
Cody

Answer 1 · 2023-04-24T07:10:15.000Z

Can add to this that we have been able to access the audio data itself, and it's really just used splits we're looking for. Curiously, the total number doesn't add up to one million so I guess some filtering/concatenation has been done as well.

Answer 2 · 2023-08-14T08:12:55.000Z

Hello, Have you find the 'processed_annotations'data such as output_labels_msd.txt，index_msd.tsv，train_gt_msd.tsv? @codyhesse @carlthome @Spijkervet Thanks for your comment.

Answer 3 · 2023-08-14T08:16:25.000Z

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Answer 4 · 2023-08-14T10:34:26.000Z

Thank you very much. @carlthome

Answer 5 · 2024-03-25T01:06:33.000Z

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Where can I get the npy files?

Answer 6 · 2024-03-25T04:33:16.000Z

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

Answer 7 · 2024-03-25T20:24:19.000Z

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

I mean the original data files like "3/6/36122424.npy". Is there a place I can download it?

Answer 8 · 2024-03-25T22:53:01.000Z

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it.
https://arxiv.org/abs/2308.16389

Answer 9 · 2024-03-26T02:10:09.000Z

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it. https://arxiv.org/abs/2308.16389

All right, thank you for your explainition. I have searched them for a while.