/Analysis-of-Song-Genres-using-PySpark

Data Analysis of music genres, songs, artists and albums using Pyspark

Primary LanguageJupyter Notebook

Analysis of Song Genres

This project is based on SparkSQL for analysis of different music genres, artists, albums and songs. In the final part of the project, we use matplotlib library for visualization purpose.

Two datasets are used:

  1. listenings.csv https://drive.google.com/file/d/14dMLzOTIf1GK-P6bA9rVEI_1WSedOdZU/view?usp=sharing
    • A data file of 1GB
    • Contains: user_id, date, track, artist and album
  2. genre.csv https://drive.google.com/file/d/1q8VWIZFjlOP_91z0GjbCe4RpmtGVDkvz/view?usp=sharing
    • Contains: artist, genre

Requirements

  1. Google account to access data and run notebook on Google colab
  2. Installing pyspark

!pip install pyspark

Analysis Performed

  1. 'Date' column is removed from the dataset and rows with na are dropped.

  2. Find all of the records of those users who have listened to Rihanna

drawing

  1. Find top 10 users who are fan of Rihanna

drawing

  1. Find top 10 famous tracks

drawing

  1. Find top 10 famous tracks of Rihanna

drawing

  1. Find top 10 famous albums

drawing

  1. Inner join two dataframes

drawing

  1. Find top 10 users who are fan of pop music

drawing

  1. Find top 10 famous genres

drawing

  1. Find out each user favourite genre

drawing

drawing

  1. find out how many pop,rock,metal and hip hop singers we have and then visualize it using bar chart

drawing

drawing