My internship was at the NSW Department of Planning, Industry and Environment (DPIE), Water Group, and I used the Software: Python and R.
The main project that I focused on during my internship was providing a report on: "The Importance of Interpretability in Modelling Urban Water Systems".
Water managers are facing a challenging task to plan for future infrastructures and policy changes with uncertainties such as climate, population and behavioral changes WATHNET has been used to estimate yield for the Sydney water supply system since the mid 1990’s
It is a complex model which simulates water supply for the Sydney water One of the drawbacks of WATHNET is that it can be considered as a “black box model” The main goal was to interpret this black box model with the help of data science techniques.
Interpretability can be achieved by using interpretable models or model-agnostic interpretation tools
Interpretable models explain themselves. By approximate the current model with them, we can create:
Interpretable IF-THEN-ELSE rules from Decision Trees, that enabled non-technical users to easily interpret results and take decisions. The estimated weights from multiple linear regression have an easy interpretation on a modular level. Also, a confidence interval could be built to show plausible values where we believe the “true” population weight can take.
Model Agnostic is an alternative approach for interpreting black box models and its greatest benefit is flexibility. In fact, data scientists can use any machine learning algorithms they like.
The Model-Agnostic methods used are Feature Interaction, Permutation Feature Importance and Partial dependence plot.