Machine Learning DJ

1. Introduction

This respository applies machine learning to data from the Spotify API. The data includes a number of playlists which were compiled based on 3 distinct genres, which are:

Jazz
House
RnB

We also have data which is a list of Ryan's top 20 favourite songs, to which we applied NLP and Sentiment Analysis.

The results are outlined below.

See instructions for how to use this repsitory at the end of the README.

2. NLP

a. Sentiment Analysis

We completed a sentiment analysis on 20 of Ryan's favourite songs. The sentiment analysis was for both song titles and song lyrics.

The mean sentiment for titles was neutral (89.1%). This was also the case for song lyrics (71.6%).

b. NLP

The top 10 words for Ryan's favourite song titles were as follows:

'afire'
'angels'
'mama'
'acoustic'
'poet'
'pendulum'
'love'
'given'
'us'
'take'

The top 10 words for Ryan's favourite song lyrics were as follows:

'la'
'oh'
'im'
'like'
'love'
'one'
'na'
'back'
'wa'
'dont'

The top 10 four-word phrases from the lyrics of Ryan's favourite songs were as follows:

'wan', 'na', 'wake', 'one'
'na', 'wake', 'one', 'day'
'mama', 'mama', 'heywe', 'aint'
'mama', 'heywe', 'aint', 'coming'
'heywe', 'aint', 'coming', 'home'
'mama', 'dont', 'stress', 'mindwe'
'dont', 'stress', 'mindwe', 'aint'
'stress', 'mindwe', 'aint', 'coming'
'mindwe', 'aint', 'coming', 'home'
'mama', 'gon', 'na', 'alrightdry'

c. Word Cloud

See a Word Cloud based off the lyrics of Ryan's top 20 favourite songs below:

3. Machine Learning

a. Neural Network Model

This is a demonstration of how we utilise Neural Network in Machine learning to make genre predictions for the 1,085 songs that we have on spotify. This playlist only consist of 3 genres (which will serve as our "target"):

R&B
Jazz
Rock

b. Data Analytics

We ran a feature importance analysis and figured that these 3 audio features played a big role in our machine learning model predictions.

Instrumentalness - This value represents the amount of vocals in the song. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
Tempo - Tempo is the speed or pace of a given music and derives directly from average beat duration.
Energy - Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy

c. Process Map

Data was split using train_test_split
Scaled using standard scaler
Utilised OneHotEncoder to fit y_train
Initiated neural network model
Added 2 neural layers
Compiled & fitted
Included 10% of our training data as validation data to obtain x_val & y_val accuracy
Comparing it to our test data accuracy

We further optimised the accuracies the neural model by altering the number of neural layers, no. of epochs.

The model came to a steady state after 60 epochs with the following results:

val_accuracy = 0.9136
val_loss = 0.2365
test_accuracy = 0.8376
test_loss = 0.6086

The fact that val accuracy is higher than test accuracy suggests that the hyperparameters in this model have been specifically tuned for this specific validation dataset. A bigger sample size might be needed to improve this.

4. How to Use This Repository

Raw data was collated via spotipy package "01 playlist_export_function.ipyb". Simply just input your username & spotify URI and it will churn out that specific playlist together with its audio features.
"03 final_neural" is where the neural network model is set up.

hdoa369/machine-learning-spotify