/dm-project

Final project for the Data Mining Course, [INF LM-18], University of Pisa

Primary LanguageJupyter Notebook

Data Mining Project

This project consists in data analysis based on the use of data mining tools. It has to be performed by using Python. The guidelines require to address specific tasks and to report results in a unique paper. Well commented Python notebooks contains the code of each task.

Before talking in details about the tasks, some tips (for the correct execution and visualization of the supplied notebooks) are provided.

Setup 💻

Create a virtual environment , and install the dependecies:

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

 

Presentations

In the subfolders presentations you can find slides we used to present our project and discuss about Evaluation of Explainable AI (link to the original paper)

 

Tasks ✔️

Task 1 - Data Understanding and Preparation

 

Task 1.1: Data Understanding

Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations.

Task 1.2: Data Preparation

​Improve the quality of data and prepare it by extracting ​ new features interesting for describing the customer profile and his purchasing behavior.

 

Task 2: Clustering analysis

Based on the customer’s profile explore the dataset using various clustering techniques. Different algorithms and approaches must be compared:

  • K-means
  • Density-based clustering (DBSCAN)
  • Hierarchical clustering

 

Task 3: Predictive Analysis

Consider the problem of predicting for each customer a label that defines if (s)he is a high-spending customer, ​ medium-spending customer or ​ low-spending customer. After having defined some indicators for assigning these labels, perform the predictive analysis comparing the performance of different models:

  • Decision Tree
  • Random Forest
  • SVM
  • KNN
  • Naive Bayesian

 

Task 4: Sequential Pattern Mining

Model the customer as a sequence of baskets and apply the sequential pattern mining algorithm.

 

(Our) Additional Task: Association Rules Mining

Extra task about frequent patterns and association rules analysis, exploiting Apriori algorithm

 

Contributors ✨