Spijkervet/CLMR

Train / Validation / Test splits for million song dataset

codyhesse opened this issue · 9 comments

Hi!
Thank you for releasing this repo :)

I was wondering where I can find the train/test/validation splits you used for MSD? My team and I are trying to reproduce this study but, unfortunately, we can't find the 201 680 / 11 774 / 28 435 splits and the corresponding tags from Last.FM. Would be very helpful for any assistance on this!

Kind regards,
Cody

Can add to this that we have been able to access the audio data itself, and it's really just used splits we're looking for. Curiously, the total number doesn't add up to one million so I guess some filtering/concatenation has been done as well.

Hello, Have you find the 'processed_annotations'data such as output_labels_msd.txt,index_msd.tsv,train_gt_msd.tsv? @codyhesse @carlthome @Spijkervet Thanks for your comment.

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Thank you very much. @carlthome

lix4 commented

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Where can I get the npy files?

lix4 commented

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

I mean the original data files like "3/6/36122424.npy". Is there a place I can download it?

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it.
https://arxiv.org/abs/2308.16389

lix4 commented

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it. https://arxiv.org/abs/2308.16389

All right, thank you for your explainition. I have searched them for a while.