FanG-HPO_jml: An R repository from acandelieri

This file is part of the R project named "FanG-HPO_jml".

FanG-HPO_jml is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

FanG-HPO_jml is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

For information on GNU General Public License, see http://www.gnu.org/licenses/.

Authors: Antonio Candelieri, Andrea Ponti, and Francesco Archetti Univesity of Milano-Bicocca, Italy 2023

The R project FanG-HPO_jml aims at replicating the experiments reported in the paper "Fair and Green Hyperparameter Optimization via Multi-objective and Multiple Information Source Bayesian Optimization" (by Antonio Candelieri, Andrea Ponti, Francesco Archetti), Machine Learning, Springer

The project is organized as follows:

the 'data' folder contains all the datasets, with suffixes 'full' and 'reduced' representing the ground-truth and the cheap sources, respectively;
the 'AGP.R' file refers to the functions for fitting the Augmented Gaussian Process (AGP) model and also using it for making predictions;
the 'core.R' file contains the functions implementing (a) the proposed approach and (b) the computation of the Expected Hyper Volume Improvement (EHVI) acquisition function;
the 'Pareto.R' file contains all the functions for computing the approximated Pareto front, the associated Hyper Volume, etc;
the 'run_fairML.R' is used to run the experiments with the 'fair-by-design' Machine Learning algorithms (from the R package 'fairML');
the following Python files are used to compute 'fairness' and 'accuracy' (on k-fold cross validation). They are Python files because they are used by the Python-based 'competitors' Autogluon and BoTorch-MOMF. Instead of re-implementing the computation in R, the Python files are recalled from the R code, to guarantee homogenity in the computation:
- fairness.py
- kfold_stratified_MLP.py
- kfold_stratified_RF.py
- kfold_stratified_XGB.py
- kfold_stratified_SVM.py
Finally, a separate .R file is provided for running FanG-HPO on a specific 'dataset - ML algo' pair. The name of each file is defined as follows:

FanG_from_AutogluonFairBO__.R

with in {'MLP','RF','SVM','XGB'}, and in {'ADULT', 'COMPAS','GERMANCREDIT','LAWSCHOOLADMISSIONS'}

IMPORTANT NOTES:

The experiments are performed by starting from the initial designs of AutogluonFairBO, in order to guarantee a fair comparison between the diffferent methods. For more detailed information, please refere to the paper.
AutogluonFairBO results must be downloaded from:

https://drive.google.com/drive/folders/1qxSU2iuyvf1BZFfkDyrYPLSueyFPc3J3

and local pathways to the folders must be updated in all the files

FanG_from_AutogluonFairBO__.R

before running them.
This R project refers only to the code needed to run FanG-HPO. The code for running experments with AutogluonFairBO and BoTorchMOMF is also freely available but in two separated repositories:

Autogluon-FairBO: https://drive.google.com/drive/folders/1-2PYP6uS-r8Oe70ZwSxplJr6kaWPdDCM?usp=sharing (for installation and configuration, please use official documentation of autogluon).

BoTorch-MOMF: https://github.com/andreaponti5/FanG-HPO-MOMF.git (for installation and configuration, please use official documentation of BoTorch).

acandelieri/FanG-HPO_jml