/cervical-cancer

cervical-cancer

Primary LanguageJupyter Notebook

cervical-cancer

cervical-cancer

  1. Scikit-learn is used to implemented machine learning models and stacking models https://scikit-learn.org/stable/ 2.SMOTETomek is used to combine undersampling and oversampling to handle imbalanced data https://imbalanced-learn.org/stable/references/generated/imblearn.combine.SMOTETomek.html
  2. SHAP explainers is used to interpret the model that provides local and global explanations. https://github.com/shap/shap/blob/master/docs/index.rst
  3. Matplotlib.pyplot is used to plot ROC curve. https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html
  4. A stratified sampling method was used to divide the dataset into two parts: 70% training and 30% testing.
  5. Evaluation metrics are used to evaluate ML models. Stacking models are compared with different ML models: RF, LR, DT, SVM, NB, and ensemble models: bagging, boosting, and voting. The results of each feature selection methods to select 15 optimal features are presented. The results of applying models to selected features by REF, Chi2, and based-tree feature selection are presented.