Song Popularity Classification Using Spotify Dataset

This repository contains the code and documentation for the project "Song Popularity Classification Using Spotify Dataset", conducted from May 2023 to August 2023.

Project Overview

Objective: The project aimed to develop a recommendation system for classifying hit songs on the popular music streaming platform Spotify using machine learning techniques.
Contributors:
- Tahsin SOYAK, Yozgat Bozok University, Department of Computer Engineering
- Muhammet Emin SAHIN, Yozgat Bozok University, Department of Computer Engineering

Key Highlights

Spearheaded the development of a machine learning model and conducted in-depth analysis and visualization of the Spotify dataset.
Positioned the project as a foundation for future research in song feature analysis, with implications for personalized music recommendations and enhanced user experiences on music streaming platforms.
Showcased the effectiveness of the XGB Classifier as the most accurate model, achieving an impressive accuracy rate of 83.55%.

Toolbox

Python
Pandas
NumPy
Scikit-learn
Matplotlib
Seaborn
Random Forest
Decision Tree
KNeighbors
Logistic Regression
eXtreme Gradient Boosting (XGB)
Linear SVC

Abstract

This research focuses on developing a recommendation system for classifying hit songs on the popular music streaming platform Spotify, using machine learning techniques. A dataset comprising 3,782 songs from various playlists was collected through the Spotify Web API's developer platform. Pre-processing steps, including feature engineering and selection, are applied to improve the data quality and relevance for predictive modeling. The dataset is categorized into hit and non-hit songs based on a popularity threshold, and six classification models (Random Forest, Decision Tree, KNeighbors, Logistic Regression, eXtreme Gradient Boosting (XGB), and Linear SVC) were employed for hit prediction. The results demonstrate the XGB Classifier as the most effective model, achieving an accuracy rate of 83.55%. Feature engineering and selection significantly contributed to improving the classification accuracy, showcasing the importance of data preprocessing in machine learning applications. This study offers significant knowledge into the discerning attributes that differentiate popular and unpopular songs, thereby presenting potential implications for music recommendation systems. The research establishes the potential of machine learning techniques in understanding the factors influencing song popularity on digital music platforms. By accurately classifying hit songs, this work contributes to enhancing music recommendation systems, benefiting both music listeners and industry professionals. The findings lay the groundwork for future research in song feature analysis and classification, with implications for personalized music recommendations and improved user experiences on music streaming platforms.

Keywords

Artificial Intelligence, Music, Machine Learning, Data Analysis, Spotify

Özet

Bu araştırma, popüler müzik yayın platformu Spotify üzerindeki hit şarkıları sınıflandırmak için makine öğrenme tekniklerini kullanarak bir öneri sistemi geliştirmeye odaklanmaktadır. Spotify Web API'si geliştirici platformu aracılığıyla çeşitli çalma listelerinden 3.782 şarkı içeren bir veri kümesi toplanmıştır. Veri kalitesini ve öngörü modellemesi için uygunluğunu artırmak için önişleme adımları, özellik mühendisliği ve seçimi uygulanmıştır. Veri kümesi, popülerlik eşiğine dayalı olarak hit ve hit olmayan şarkılar olarak kategorize edilmiştir ve hit tahmini için altı sınıflandırma modeli (Rastgele Orman, Karar Ağacı, K-en Yakın Komşu, Lojistik Regresyon, Aşırı Gradyan Artırma, Doğrusal Destek Vektör Sınıflandırma) kullanılmıştır. Sonuçlar, XGB Sınıflandırıcısının en etkili model olarak 83.55% doğruluk oranına ulaştığını göstermektedir. Özellik mühendisliği ve seçimi, sınıflandırma doğruluğunu önemli ölçüde artırmaya katkıda bulundu ve veri ön işleme öneminin makine öğrenme uygulamalarında vurgulanmıştır. Bu çalışma, popüler ve popüler olmayan şarkıları ayırt eden ayırt edici nitelikler hakkında önemli bilgiler sunmakta ve böylece müzik tavsiye sistemleri için potansiyel çıkarımlar sunmaktadır. Araştırma, dijital müzik platformlarında şarkı popülaritesini etkileyen faktörleri anlama konusunda makine öğrenme tekniklerinin potansiyelini ortaya koymaktadır. Hit şarkıları doğru bir şekilde sınıflandırarak, bu çalışma müzik öneri sistemlerinin geliştirilmesine katkıda bulunurken, müzik dinleyicileri ve endüstri profesyonellerine fayda sağlar. Bulgular, şarkı özellik analizi ve sınıflandırma konusunda gelecekteki araştırmalar için temel oluştururken, kişiselleştirilmiş müzik önerileri ve müzik yayın platformlarında kullanıcı deneyimini iyileştirme konusunda önemli sonuçlar doğurur.

Anahtar Kelimeler

Yapay Zeka, Müzik, Makine Öğrenimi, Veri Analizi, Spotify

tahsinsoyak/song-popularity-prediction