Objective: In this project, I'll employ K-Nearest Neighbors (KNN) to classify breast cancer cases. I'll classify breast cancer cases using KNN, optimize model parameters, perform dimensionality reduction with PCA, and visualize clusters in a 2D space.
Data source: Breast cancer data from sklearn.datasets
Here's a breakdown of the tasks I'll be performing using Python:
-
Data Load & Inspection: I'll load and inspect the breast cancer dataset, exploring feature names and creating a label table.
-
K Nearest Neighbors (KNN) Classification: I'll train a KNN classifier with different k values, identify the optimal k, and visualize accuracy vs. k.
-
Principal Component Analysis (PCA): I'll reduce dimensionality using PCA, determining the optimal components through cumulative explained variance and cross-validation.
-
KNN Clustering Visualization: I'll create a 2D visualization combined with a confidence map using PCA and KNN clustering, highlighting cluster labels.