/MUSE

Machine learning project for a course at UC Davis

Primary LanguageMatlab

MUSE

For an explanation of the project please view the report

##Data These will be the features we use:

  • artist_term
    • Echo Nest tags
  • danceability (REMOVED, datasheets come back as all 0s)
    • between 0 and 1
  • duration
    • in seconds
  • end_of_fade_in
    • end time in seconds
  • energy (REMOVED, datasheets come back as all 0s)
    • between 0 and 1
  • key (estimation)
    • key signiatures start at 0 (C) and ascend the chomatic scale 1 (D-flat)
  • loudness
    • measured in dB
  • mode (estimation)
    • major or minor
  • song_hotness (some are NaN)
    • algorithmic estimation
  • start_of_fade_out
    • start time in seconds
  • tempo
    • in BPM
  • time_signature (estimation)
    • beats per bar
  • track_id
  • year
    • year of release, 0 if not known

http://labrosa.ee.columbia.edu/millionsong/pages/field-list

To build the csv file for use with MATLAB first download the MillionSongSubset files. Then cd into the directory with getCsvFromData.py and run the command python getCsvFromData.py <path to MillionSongSubset/data/>

This data folder contains a file merged_song_database.db which is a combination of subset_artist_similarity.db, subset_artist_term.db, and subset_track_metadata.db from the original subset data. To get the song_data.csv I ran this query against the database

select 
songs.*,
group_concat(artist_term.term, ';')
from songs
inner join artist_term
on artist_term.artist_id = songs.artist_id
group by track_id;

which just appends a list of the artist tems separated by a ';' to each track. That should give us a decent start.

You can download any sqlite db browser like [sqlite browser]: http://sqlitebrowser.org/ to run your own SQL queries against it.