This project demonstrates various clustering methods and visualizations for time series data. It includes utility functions for running clustering algorithms, plotting results, and performing silhouette analysis.
- Abdesselam BENAMEUR
- Hakim IGUENI
- Salma TALANTIKITE
utils.py
: Contains utility functions for clustering and visualization.ProjetUL.ipynb
: Jupyter notebook demonstrating the use of the utility functions on a dataset.
Make sure you have the following Python packages installed:
- numpy
- pandas
- matplotlib
- seaborn
- scipy
- scikit-learn
- tslearn
- kmedoids
You can install these packages using pip
:
pip install numpy pandas matplotlib seaborn scipy scikit-learn tslearn kmedoids
The data used in this project is loaded from the following URLs:
X
: Dataset URLAPPART
: Dataset URLJOUR
: Dataset URL
The utils.py
file contains the following utility functions:
run_clustering_method(clustering_method, X, metric='precomputed', verbose=True)
: Runs a clustering algorithm for a range of cluster numbers (2 to 10), computes silhouette scores and inertia, and prints the results.plot_silhouette_analysis(X, labels, title)
: Plots the silhouette analysis for the clustering results.plot_heatmap(Y, cmap=[green, red, blue, yellow])
: Plots a heatmap of the given matrixY
.visualize_clusters(X, labels, title)
: Visualizes clusters by plotting time series data that belong to the same cluster.
The ProjetUL.ipynb
notebook demonstrates the use of the utility functions on a dataset. It includes the following steps:
- Read the data: Load the dataset from the provided URLs.
- Calculate the distance matrix: Compute the distance matrix using Euclidean and DTW distances.
- Clustering: Apply KMedoids and KMeans clustering methods.
- Visualization: Use elbow method, silhouette scores, heatmaps, and cluster visualizations to analyze the results.
# Import utility functions
from utils import *
# Load the data
X = pd.read_csv("http://allousame.free.fr/mlds/donnees/X.txt", sep=" ", header=None)
# Calculate distance matrix using Euclidean distance
from scipy.spatial.distance import pdist, squareform
dist_matrix_euc = pdist(X, metric='euclidean')
dist_matrix_euc = squareform(dist_matrix_euc)
# Run KMeans clustering
from sklearn.cluster import KMeans
clust_method = lambda k: KMeans(n_clusters=k, random_state=0, n_init='auto')
inertias, silhouettes, labels = run_clustering_method(clust_method, X, metric="euclidean")
# Plot results
plot_inertia(inertias, title="Elbow method for KMeans with Euclidean distance")
plot_silhouette(silhouettes, title="Silhouette score for KMeans with Euclidean distance")
visualize_clusters(X, labels, title="Visualizing KMeans clusters with Euclidean distance")
This project provides a comprehensive example of clustering time series data and visualizing the results using various techniques. The provided utility functions and Jupyter notebook can be adapted to different datasets and clustering methods.