Spotify Dataset Analyzer

Overview

This Python project is designed to process and analyze Spotify dataset to predict patterns using the K-Nearest Neighbors (KNN) algorithm. The program checks for data integrity, preprocesses the dataset, encodes features, and applies KNN to predict outcomes based on provided inputs.

Features

Dataset integrity check and automatic correction.
Data loading and preprocessing with encoding.
Splitting dataset into training and testing sets.
Combining and reshaping data for analysis.
Utilization of K-Nearest Neighbors (KNN) for predictions.
Detailed output of predictions and cosine similarity results.
Advanced data retrieval and plotting of top results.

Requirements

Python 3.x
Libraries: numpy, matplotlib, seaborn, sklearn, scipy, os
A Spotify dataset file named correct_dataset.csv located in a data directory.

Installation

Clone this repository and ensure that all required Python libraries are installed by running:

pip install numpy matplotlib seaborn scikit-learn scipy

Usage

To use this program, follow these steps:

Prepare the Dataset: Ensure the Spotify dataset file named correct_dataset.csv is located in the ../data/ directory relative to the script. If the dataset is not correct, the program will attempt to automatically fix it by referencing a file named spotify_dataset.csv.
Run the Script: Execute the script in your Python environment using the command:
```
python spotify_analyzer.py
```
Follow the on-screen prompts to interact with the program.

How It Works

Data Integrity Check: Initially, the program checks if the required dataset exists and is correct. If not, it calls a function to correct the dataset.
Data Loading and Encoding: The dataset is loaded and encoded to transform raw data into a format suitable for machine learning.
Training and Testing: The data is split into training and test sets, with 75% of the data used for training.
Prediction and Analysis: KNN is used to predict the outcomes based on the test dataset. Predictions and their accuracy are then printed out.
Results Interpretation: The program allows users to input an ID to find related entries and prints a list of potential related IDs based on the predictions.
Visualization: A bar chart of the top ten successful songs is displayed, highlighting the success rates using data visualizations.