
This is a music recomendator of your music preferences.

Primary LanguagePython

Spotify Dataset Analyzer


This Python project is designed to process and analyze Spotify dataset to predict patterns using the K-Nearest Neighbors (KNN) algorithm. The program checks for data integrity, preprocesses the dataset, encodes features, and applies KNN to predict outcomes based on provided inputs.


  • Dataset integrity check and automatic correction.
  • Data loading and preprocessing with encoding.
  • Splitting dataset into training and testing sets.
  • Combining and reshaping data for analysis.
  • Utilization of K-Nearest Neighbors (KNN) for predictions.
  • Detailed output of predictions and cosine similarity results.
  • Advanced data retrieval and plotting of top results.


  • Python 3.x
  • Libraries: numpy, matplotlib, seaborn, sklearn, scipy, os
  • A Spotify dataset file named correct_dataset.csv located in a data directory.


Clone this repository and ensure that all required Python libraries are installed by running:

pip install numpy matplotlib seaborn scikit-learn scipy


To use this program, follow these steps:

  1. Prepare the Dataset: Ensure the Spotify dataset file named correct_dataset.csv is located in the ../data/ directory relative to the script. If the dataset is not correct, the program will attempt to automatically fix it by referencing a file named spotify_dataset.csv.

  2. Run the Script: Execute the script in your Python environment using the command:

    python spotify_analyzer.py
  3. Follow the on-screen prompts to interact with the program.

How It Works

  1. Data Integrity Check: Initially, the program checks if the required dataset exists and is correct. If not, it calls a function to correct the dataset.
  2. Data Loading and Encoding: The dataset is loaded and encoded to transform raw data into a format suitable for machine learning.
  3. Training and Testing: The data is split into training and test sets, with 75% of the data used for training.
  4. Prediction and Analysis: KNN is used to predict the outcomes based on the test dataset. Predictions and their accuracy are then printed out.
  5. Results Interpretation: The program allows users to input an ID to find related entries and prints a list of potential related IDs based on the predictions.
  6. Visualization: A bar chart of the top ten successful songs is displayed, highlighting the success rates using data visualizations.