/MusicRecommenderCF

Music recommender system based on collaborative filtering using the ListenBrainz listens dataset.

Primary LanguageJupyter Notebook

MusicRecommenderCF

Music recommender system based on collaborative filtering using the ListenBrainz listens dataset.

Summary

This project aims to develop a music recommendation system using collaborative filtering with the ListenBrainz database. The goal is to create a model that can recommend artists to users based on their listening history, by identifying patterns in the listening behavior of similar users.

The project consists of two main parts divided in two different Colab Python Notebooks: data processing (ListeningBrainzColFilt_DataTransform.ipynb) and model generation (ListeningBrainzColFilt_RecomModel.ipynb).

In the data processing step, the files containing the listening history of the users from the ListenBrainz database are parsed and transformed into a final csv file containing the id of each user, an id for each artist and the playcount of each artist by that user.

This file is then used in the model generation step, where a sparse matrix is created from the user and artist indexes, and the nearest neighbor's similarity is computed using alternating least squares. The model's accuracy is evaluated by testing it on a set of artists and comparing its recommendations to those of other recommender systems.

Technologies used in this project include Python, Pandas, NumPy, and the implicit library for collaborative filtering.

Data

The ListenBrainz data is distributed in 12 files, one for each month of 2022.

Every listen is identified by a recording_msid that is random (the same song could have multiple different recording_msid). To compare the data we need to convert this id into a recording_mbid that is present in the database.

To do this there are three additional files:

  • A mapping between each recording_msid and recording_mbid in the ListenBrainz database (listenbrainz_msid_mapping.csv)
  • The ListenBrainz Canonical Metadata database, which includes a list of all artist_mbid (canonical_musicbrainz_data.csv).
  • artist_mbid to artist name mapping, with the name of all artists in the MusicBrainz database. (musicbrainz_artist_mbid_name.csv)

Model

The model uses a collaborative filtering recommendation system to make personalized recommendations by analyzing user-artist interactions. Data is stored in a sparse matrix, weighted using the BM25 algorithm, and the Alternating Least Squares (ALS) algorithm is used to learn the latent features of the users and artists. The model finds the most similar items using cosine similarity and provides a list of recommended items sorted by their similarity score.

Results

To evaluate the model's accuracy, it's tested for some known artists and check the resulting recommendations.

Artist recommendations for Iron & Wine

n score artist_name
0 1.000000 Iron & Wine
1 0.422283 Gregory Alan Isakov
2 0.390881 Fleet Foxes
3 0.383695 The Shins
4 0.376312 Bon Iver
5 0.375864 José González
6 0.369226 Sufjan Stevens
7 0.365435 Lord Huron
8 0.360813 Death Cab for Cutie
9 0.341159 The Head and the Heart

Artist recommendations for Masayoshi Takanaka (高中正義)

n score artist_name
0 1.000000 高中正義
1 0.386140 Casiopea
2 0.328007 大貫妙子
3 0.326699 T‐SQUARE
4 0.301526 杏里
5 0.278965 Lamp
6 0.277665 Naniwa Express
7 0.277088 Piper
8 0.269268 Serge Ponsar
9 0.268987 TUCKER

Artist recommendations for The Band

n score artist_name
0 1.000000 The Band
1 0.518695 Van Morrison
2 0.494562 Crosby
3 0.464024 Bob Dylan
4 0.462077 Crosby
5 0.435122 Grateful Dead
6 0.425560 George Harrison
7 0.420686 Neil Young
8 0.418821 The Byrds
9 0.413197 Warren Zevon