Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.
Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself:
Group 1: 367 instances (January 1989)
Group 2: 70 instances (October 1989)
Group 3: 31 instances (February 1990)
Group 4: 17 instances (April 1990)
Group 5: 48 instances (August 1990)
Group 6: 49 instances (Updated January 1991)
Group 7: 31 instances (June 1991)
Group 8: 86 instances (November 1991)
-----------------------------------------
- Sample code number: id number
- Clump Thickness: 1 - 10
- Uniformity of Cell Size: 1 - 10
- Uniformity of Cell Shape: 1 - 10
- Marginal Adhesion: 1 - 10
- Single Epithelial Cell Size: 1 - 10
- Bare Nuclei: 1 - 10
- Bland Chromatin: 1 - 10
- Normal Nucleoli: 1 - 10
- Mitoses: 1 - 10
- Class: (2 for benign, 4 for malignant)
The dataset is divided into two parts, Trainings set(80% of the dataset) Testing set(20% of the dataset)
Logistic regression model is applied on the training dataset and when tested on the testing dataset the following results were obtained:
Accuracy Score = 0.9562043795620438 and
K-Fold cross validation shows the following accuracy for the model:
Accuracy = 96.32% Standard Deviation = 3.69%
Dataset is taken from the following webpage: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28original%29