/Isolation_Forest

Primary LanguageJupyter Notebook

Isolation_Forest

The Isolation Forest is an unsupervised and non-parametric technique introduced by Fei Tony Liu in 2012. It builds an ensemble of special binary trees, called Isolation trees. The goal is to isolate anomalous instances from normal ones. Since it's based on decision trees, it's able to handle large volumes of data, has a linear time complexity with a small constant and doesn't require much memory. It assumes that the anomalies are a minority consisting of few observations and they have different values of features compared to the normal instances.

Despite its simplicity, speed and intuitiveness, there is a drawback. The lack of explanation. Why is a particular observation considered anomalous by the algorithm? How can the output be interpreted?

To explain the isolation forest, I will use the SHAP, which is a framework presented in 2017 by Lundberg and Lee in the paper “A Unified Approach to Interpreting Model Predictions”. SHAP stands for Shapley Additive exPlanations. It is based on Shapley values, built on concepts of game theory. The idea is to explain how much each predictor contribute to the output of the model.

Articles related to the codes

  1. Anomaly Detection With Isolation Forest
  2. Interpretation of Isolation Forest with SHAP