/S-OCT

Primary LanguageCommon Lisp

S-OCT

S-OCT is a mixed-integer programming (MIP) formulation for training an optimal multivariate decision tree. For ease of use, the model is implemented as a scikit-learn classifier, thus it can be used with model selection and evaluation tools such as pipeline.Pipeline, model_selection.GridSearchCV, and model_selection.cross_validate, as demonstrated in the script soct_comprehensive.py.

Our code for S-OCT, as well as our reimplementations of OCT/OCT-H and FlowOCT, can be found in the src folder.

The datasets folder contains datasets from the UCI Machine Learning Repository. The file datasets.py contains code for loading these datasets, as well as scikit-learn compatible transformers for performing bucketization, a preprocessing step needed by algorithms that assume binary features.

The CPAIOR2022 folder is here for archival purposes. It contains codes needed to run the experiments from our CPAIOR 2022 paper.

The remaining code files in this repo are scripts to perform experiments in our Constraints paper. parameter_tuning.py runs the tuning experiments described in Section 6.1.1. The *_mip.py scripts run the direct MIP comparison described in Section 6.1.2; these scripts write results to mip_comparison.csv. The *_comprehensive.py scripts run the comprehensive comparison desribed in Section 6.1.3; these scripts write results to comprehensive.csv. In addition to the models in src, our experiments also test models from Interpretable AI and PyDL8.5.

References

  1. Boutilier, J., Michini, C., Zhou, Z. (2022). Shattering Inequalities for Learning Optimal Decision Trees. Proceedings of CPAIOR 2022. DOI:10.1007/978-3-031-08011-1_7 (Best paper award)
  2. Boutilier, J., Michini, C., Zhou, Z. (2023). Optimal multivariate decision trees. Constraints. DOI:10.1007/s10601-023-09367-y