Here in this project I have implemented machine learning algorithms to predict Breast Cancer in patients using their Gene Expression Data
I have implemented 3 machine learning Models:
- Support Vector Machine
- Randomn Forest
- Decision Tree
Model | Accuracy (%) |
---|---|
Randomn Forest | 99.07 |
Decision Tree | 97.53 |
Support Vector Machine | 97.22 |
-
The Gene Expression dataset is attached as upscale.csv
- There are total 21 Genes in the Dataset
- The first column represent the patient ID
- The other 21 columns represent each gene expression data
- And the last column represent the result i.e. if patient has Breast Cancer then 1 and if he/she does not have Breast Cancer then 0
-
I have plotted the Confusion Matrix for each Model
-
Also I have implemented Eli5 (Explain Like I am 5) inorder to get the feature importance or contribution of each Gene