The project comes as a deepening of the final work of my Bachelor in Mathematical Engineering @ Politecnico di Milano. It is meant to be a guide to any machine learning classification-project from beginning to end.
In this repository, I divided the work into three different stages:
- part 1 - Exploration of the dataset
- part 2 - Dimensionality reduction of the dataset
- part 3 - Classification
Each section is self-explainatory and contains all the code to replicate the results.
The breast cancer dataset is a classic and very easy-to-use binary classification dataset. Gathered are various physical information of the breast cells and whether these cells are benign or malignant cancer cells.
The dataset is available on uci machine learning repository, or on kaggle.