# Spotify Data Analysis

## Introduction
This Python script analyzes Spotify data using various libraries such as Pandas, NumPy, Matplotlib, Seaborn, and scikit-learn. The analysis includes exploring song popularity, duration, energy, loudness, and genre.

## Data Source
The script uses a data file called "SpotifyFeatures.csv" to perform the analysis.

## Sections
1. Data Loading and Overview: The script starts by loading the Spotify data from the CSV file and displaying the first few rows and general information about the dataset.

2. Analyzing Popularity: It identifies the top 10 least popular and most popular songs on Spotify based on the "popularity" metric.

3. Data Transformation: The script transforms the time duration of the music from milliseconds to seconds and performs various data transformations related to keys, mode, time signatures, and popularity.

4. Exploratory Data Visualization: It utilizes visualizations to show the correlation between loudness and energy, duration of songs in various genres, and the top 5 genres by popularity.

5. Machine Learning: The script applies machine learning models such as Logistic Regression and Random Forest Classifier to predict song popularity based on various features.

## Libraries Used
- Pandas
- NumPy
- Matplotlib
- Seaborn
- scikit-learn
- XGBoost

## How to Use
To use this script, ensure that the "SpotifyFeatures.csv" file is present in the same directory as the script. Run the script in a Python environment with the required libraries installed.

## Note
The script has sections with unused code and repetition, which can be further optimized for efficiency.

## Author
This script was created by Roopak Mallik.