Spotify Sequential Skip Prediction
As a music listener, it’s intriguing to see how well Spotify understands the user’s music taste. Two key features of Spotify are:
-
“Shuffle song”: which though is a random operation, is critical to form a good user experience. This operation should be random enough, but confined to the user's taste.
-
“Skip button”: This article explains about the relationship between skip button and user experience. So, a skip button gives us information about the user’s music taste and indeed, skip rate depends on a lot of factors like the user's age, current hour of day, whether day is weekend or weekday.
So, the question that comes is whether we can predict whether a user will skip a song shortly after play time which is an indication of non-interest, when a playlist of songs are played sequentially based on historical play data of the user. This will give us an intuition about the user's music preference and help to design a “Shuffle Song” feature which is more focused on user experience.
A dataset is divided into different sessions and every session has some sequence of tracks within a range of 0-20:
- Session ID
- Track ID
Every track has metadata which gives us details about music like acousticness, beat strength etc:
-
Duration - duration of the song
-
Release_year - release year
-
Us_popularity_estimate - US popularity
-
Acousticness
-
Beat_strength
-
Bounciness
-
Danceability
-
Dyn_range_mean
-
Energy
-
Flatness
-
Instrumentalness
-
Key
-
Liveness
-
Loudness
-
Mechanism
-
Mode
-
Organism
-
Speechiness
-
Tempo
-
Time_signature
-
Valence
-
Acoustic_vector_0
-
Acoustic_vector_1
-
Acoustic_vector_2
-
Acoustic_vector_3
-
Acoustic_vector_4
-
Acoustic_vector_5
-
Acoustic_vector_6
-
Acoustic_vector_7
Rest further, dataset contains details about user interaction and playlist details:
-
Session_position - Position in sequence in a single session
-
Session_length - Total session length
-
Skip_1 - Boolean indicating if the track was only played very briefly
-
Skip_2 - Boolean indicating if the track was only played briefly
-
Skip_3 - Boolean indicating if most of the track was played
-
Not_skipped - Boolean indicating that the track was played in its entirety
-
Context_switch - Boolean indicating if the user changed context between the previous row and the current row. This could for example occur if the user switched from one playlist to another.
-
No_pause_before_play - Boolean indicating if there was no pause between playback of the previous track and this track
-
Short_pause_before_play - Boolean indicating if there was a short pause between playback of the previous track and this track
-
Long_pause_before_play - Boolean indicating if there was a long pause between playback of the previous track and this track
-
Hist_user_behavior_n_seekfwd - Number of times the user did a seek forward within track
-
Hist_user_behavior_n_seekback - Number of times the user did a seek back within track
-
Hist_user_behavior_is_shuffle - Boolean indicating if the user encountered this track while shuffle mode was activated
-
Hour_of_day - {0-23} - The hour of day.
-
Date - The date.
-
Premium - Boolean indicating if the user was on premium or not. This has potential implications for skipping behavior.
-
Context_type - what type of context the playback occurred within
-
Hist_user_behavior_reason_start - - the user action which led to the current track being played.
-
Hist_user_behavior_reason_end - the user action which led to the current track playback ending.