IML-for-IPF-classification

This master's degree thesis consist in develop a Workflow for an Interpretable Machine Learning classification task, to help doctors in IPF diagnosis. The workflow has implemented with a first phase of data preparations, to understand the domain for perform our classification tasks. Then we proceed with select machine learning models considering the small dimension of dataset and the interpretability aspect.

Python Package exploited

  • Scikit-learn
  • SHAP (SHAPley Values)
  • Pandas
  • DiCE (Diverse Counterfactual Explanations)
  • PyPlot
  • Numpy

Expected Result

Doctors want to know if genetic features could have a significant importance on predict IPF diagnosis. Due to dataset's dimension we expect several difficulties in interpreting the results in terms of accuracy. We will rely on Shapley Values to verify the importance of features within the models. In terms of results, we expect the classification to confirm the state of the art on the disease. This would be an important result to confirm the effectiveness of artificial intelligence in comparison with a truly small dataset.