Code for the paper ''Strong Optimal Classification Trees''.
The code will run in python3 ( version 6 and higher) and require Gurobi9.x solver. The required packages are listed in requirements.txt
.
The content of this repository is as follows:
-
Code
contains the implementation of approaches FlowOCT and BendersOCT:-
FlowOCT.py
andBendersOCT.py
include the formulation of each approach in gurobipy. -
FlowOCTReplication.py
andBendersOCTReplication.py
contains the replication code for each approach which is responsible for creating and solving the problem and producing result -
We have some utility files as follows
Tree.py
which creates the binary treelogger.py
logs the output of console into a text file
-
-
Results
contains the experiments raw results. -
plots
contains the R script for generating figures and tables from the tabular results. -
DataSets
contains all datasets used for the experiments in the paper
- First install the required packages in the
requirements.txt
- For running an instance of FlowOCT (BendersOCT) run the main function in
FlowOCTReplication.py
(BendersOCTReplication.py
) with the following arguments- f : name of the dataset
- d : maximum depth of the tree
- t : time limit
- l : value of _lambda
- i : sample (for shuffling and splitting data)
- c : 1 if we do calibration; in this case we train our model on 50% of the data otherwise we train on 75% of the data
You can call the main function within a python file as follows (This is what we do in run_exp.py
import FlowOCTReplication
FlowOCTReplication.main(["-f", 'monk1_enc.csv', "-d", 2, "-t", 3600, "-l", 0, "-i", 1, "-c", 1])
or you could run it from terminal
python3 FlowOCTReplication.py -f monk1_enc.csv -d 2 -t 3600 -l 0 -i 1 -c 1
The result of each instance would get stored in a csv file in ./Results/
.
Remember that in the replication files we assume that the name of the class label column is target
. If you want to use a dataset of your own choice,
change the hard-coded label name to the desired name.