
metadata for SHS100K

Primary LanguageJupyter Notebook



This repository contains the metadata for Second Hands Songs 100K (SHS100K) dataset. This dataset contains about 10,000 songs with 100,000 tracks. It splits into three sets: SHS100K-TRAIN, SHS-VAL and SHS-TRAIN. The metadata is provided in list. One could crawl raw audio through youtube-dl using the provided urls.

Metadata for Second Hand Songs 100K Dataset

List contains the metadata of this dataset, including the set_id, ver_id, title, performer, url, status. 
set_id: Index of the song
ver_id: Index of different versions
title: Song's name
perfomer: Performer's name
url: YouTube urls
status: Whether the track could be downloaded by youtube-dl

Training Set

SHS100K-TRAIN: only contains set_id and ver_id

Validation Set

SHS100K-VAL: only contains set_id and ver_id

Test Set

SHS100K-TEST: only contains set_id and ver_id