Supervised and Unsupervised learning in Breast Cancer Wisconsin dataset

The work developed in this report intends to analyze in detail the Breast Cancer Wisconsin dataset by applying a thorough preliminary analysis with principal component analysis, feature selection and data visualization.

Once the analysis is completed, the problem is solved by applying different supervised learning methods like Random Forests, Support Vector Machine and K Nearest Neighbor. In order to compare the resulsts, several metrics are used such as confusion matrix, accuracy, f1-score, recall and precision.

The problem in hands is also addressed with Unsupervised methods like Agglomerative clustering and K-means algorithm. Regarding the results of this analysis, metrics such as silhouette score and davies bould v measure, purity and adjusted rand are compared. We will use certain techniques to interpret the results of these clustering methods by applying interpretable methods like PCA, dandogram, silhouette plot and a Decision Tree. Finally, comparisons between the results of the supervised and unsupervised techniques will be compared, by discussing their advantages for this specific problem and generally

Grade: 17/20.

Frameworks

Python 3, jupyter-notebook, Pandas and Scikit-learn.

https://pandas.pydata.org/

https://scikit-learn.org/stable/

https://jupyter.org/