This project focuses on clustering analysis of a Kaggle dataset using various techniques and hyperparameters. The primary objective is to explore and apply different clustering algorithms to uncover patterns and groupings within the data.
The project utilizes a Data Version Control (DVC) repository to manage the dataset and experiment tracking. DVC enables versioning and reproducibility, allowing for efficient experimentation and comparison of different clustering approaches.
Clustering Algorithms: The project implements multiple clustering algorithms, such as K-means and DBSCAN. Each algorithm is explored with different hyperparameters to evaluate their performance and identify optimal settings.
Data Preprocessing: Prior to clustering, the dataset undergoes preprocessing steps, including data cleaning, feature scaling, and feature selection as necessary.
Evaluation Metrics: The project employs various evaluation metrics to assess the quality of clustering results, such as silhouette score.
Hyperparameter Optimization: Different hyperparameter settings with search implemented using DVC pipeline.
DVC Repository: The project leverages a DVC repository for version control of the dataset and to manage the experiment tracking, allowing for easy reproduction and sharing of results.