The project provides information about breast cancer to help doctors predict if a person has it.
- Breast cancer is a disease in which abnormal breast cells grow out of control and form tumours. If left unchecked, the tumours can spread throughout the body and become fatal.
- Breast cancer cells begin inside the milk ducts and/or the milk-producing lobules of the breast. The earliest form (in situ) is not life-threatening. Cancer cells can spread into nearby breast tissue (invasion). This creates tumours that cause lumps or thickening.
- Invasive cancers can spread to nearby lymph nodes or other organs (metastasize). Metastasis can be fatal.
- Treatment is based on the person, the type of cancer and its spread. Treatment combines surgery, radiation therapy and medications.
Following the attribute related information. This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.
- Age
- Menopause
- inv-nodes
- node-caps
- deg-malig
- breast
- breast-quad
- irradiat
- Outcome (no-recurrence-events, recurrence-events)
To diagnostically predict whether or not a patient has Breast Cancer, based on certain diagnostic measurements included in the dataset.
- Importing Necessary Libraries
- Performing Exploratory Data Analysis
- Data Preprocessing
- Converting Categorical data to numerical. (Label Encoder)
- Creating X and Y
- Split the data into test and train
- Performed various model such as Logistic Regression, Decision Tree, Random Forest, Extra_Tree_Classifier, SVC, KNeighbors.
- Tuned the above model
- Smote Implementation and again running all above models for better accuracy and low recall value
Performed multiple models such as Logistic Regression, Decision Tree, Random Forest, Extra_Tree_Classifier, SVC, KNeighbors amongst them the tuned logistic regression model exhibits promising performance with an accuracy of 75.86%, indicating robust overall prediction. The low Type II error rate of 8 underscores its effectiveness in minimizing instances of false negatives, making it a strong choice for applications where identifying positive cases is crucial.