This respository applies machine learning to data from the Spotify API. The data includes a number of playlists which were compiled based on 3 distinct genres, which are:
- Jazz
- House
- RnB
We also have data which is a list of Ryan's top 20 favourite songs, to which we applied NLP and Sentiment Analysis.
The results are outlined below.
See instructions for how to use this repsitory at the end of the README.
We completed a sentiment analysis on 20 of Ryan's favourite songs. The sentiment analysis was for both song titles and song lyrics.
The mean sentiment for titles was neutral (89.1%). This was also the case for song lyrics (71.6%).
The top 10 words for Ryan's favourite song titles were as follows:
- 'afire'
- 'angels'
- 'mama'
- 'acoustic'
- 'poet'
- 'pendulum'
- 'love'
- 'given'
- 'us'
- 'take'
The top 10 words for Ryan's favourite song lyrics were as follows:
- 'la'
- 'oh'
- 'im'
- 'like'
- 'love'
- 'one'
- 'na'
- 'back'
- 'wa'
- 'dont'
The top 10 four-word phrases from the lyrics of Ryan's favourite songs were as follows:
- 'wan', 'na', 'wake', 'one'
- 'na', 'wake', 'one', 'day'
- 'mama', 'mama', 'heywe', 'aint'
- 'mama', 'heywe', 'aint', 'coming'
- 'heywe', 'aint', 'coming', 'home'
- 'mama', 'dont', 'stress', 'mindwe'
- 'dont', 'stress', 'mindwe', 'aint'
- 'stress', 'mindwe', 'aint', 'coming'
- 'mindwe', 'aint', 'coming', 'home'
- 'mama', 'gon', 'na', 'alrightdry'
See a Word Cloud based off the lyrics of Ryan's top 20 favourite songs below:
This is a demonstration of how we utilise Neural Network in Machine learning to make genre predictions for the 1,085 songs that we have on spotify. This playlist only consist of 3 genres (which will serve as our "target"):
- R&B
- Jazz
- Rock
We ran a feature importance analysis and figured that these 3 audio features played a big role in our machine learning model predictions.
- Instrumentalness - This value represents the amount of vocals in the song. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
- Tempo - Tempo is the speed or pace of a given music and derives directly from average beat duration.
- Energy - Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy
- Data was split using train_test_split
- Scaled using standard scaler
- Utilised OneHotEncoder to fit y_train
- Initiated neural network model
- Added 2 neural layers
- Compiled & fitted
- Included 10% of our training data as validation data to obtain x_val & y_val accuracy
- Comparing it to our test data accuracy
We further optimised the accuracies the neural model by altering the number of neural layers, no. of epochs.
The model came to a steady state after 60 epochs with the following results:
- val_accuracy = 0.9136
- val_loss = 0.2365
- test_accuracy = 0.8376
- test_loss = 0.6086
The fact that val accuracy is higher than test accuracy suggests that the hyperparameters in this model have been specifically tuned for this specific validation dataset. A bigger sample size might be needed to improve this.
-
Raw data was collated via spotipy package "01 playlist_export_function.ipyb". Simply just input your username & spotify URI and it will churn out that specific playlist together with its audio features.
-
"03 final_neural" is where the neural network model is set up.