The purpose of this report is to discuss the findings of the Spotify’s 2019 top 50 songs data set using a preliminary exploratory data analysis (EDA) approach. The data set contains a total of 13 attributes which includes song names, artist’s names and several song characteristics. The dimension of the data set is 50 x 13.s
The data set is a sample of how Spotify categorises their songs to provide song prediction and recommendations for its users. Spotify uses audio modelling i.e. convolutional neural networks (CNN) to assign artists songs several characteristics and metrics such as loudness, energy or genre type. Subsequently, Spotify is able to group and order songs with similar characteristics. Thereafter suggest or recommend songs to their users based on their listening preference at any time.
In conclusion, Canadian pop was the most common genre in the data set. Most of the genres encompasses subsets of pop, hip hop or rap. There are similar song characteristics in the data set. However, a higher the rank does correlate to a better characteristic value. Finally, artists in the data set can exist as either the main artist or the collaborate artist. Overall Ed Sheeran was the most present artist in the data set.