Breast-Cancer-Wisconsin

The purpose of this project is to analysis the breast-cancer-wisconsin dataset in terms of statistical, data visualization, feature selection and cancer classification (binary).

For the data visualization, both PCA and t-SNE techniques were used to map the 10-diemnsional data into 2-d space. According to the visualizations, the dataset is easy to separate.

The XGBoost model is used to analysis the feature importance. Pearson correlation was also calculated for features and class index.

There are two models were used to conduct the binary cancer classification task: XGBoost and Neural Network. Both model achieve similar performance in terms of accuracy and auc score.

The dataset is downloaded from the UCI Machine Learning respository.