This is the code for the SFA method presented in the paper Shapley-based Feature Augmentation.
To download datasets use the following link: openml datasets
Unzip the folder once downloaded.
To create an environment identical to the one we used to run the code: conda create --name myenv --file spec-file.txt
XGBoost (run on GPU machine):
python main.py --dataset_id=A number from 0-14 for binary, from 0-4 for multi class --task=binary or multi --model_name=xgb --seed=A number of your choice, we used 1-5 --compare=True if you want to compare results to Featuretools and PCA Augment, otherwise or False For instance: python main.py --dataset_id=3 --task=multi --model_name=xgb --seed=1 --compare=False
LightGBM (run on CPU machine):
python main.py --dataset_id=A number from 0-14 for binary, from 0-4 for multi class --task=binary or multi --model_name=lgbm --seed=A number of your choice, we used 1-5 --compare=True if you want to compare results to Featuretools and PCA Augment, otherwise or False For instance: python main.py --dataset_id=3 --task=multi --model_name=lgbm --seed=1 --compare=False
Random forest (run on GPU machine):
python main.py --dataset_id=A number from 0-14 for binary, from 0-4 for multi class --task=binary or multi --model_name=random_forest --seed=A number of your choice, we used 1-5 --compare=True if you want to compare results to Featuretools and PCA Augment, otherwise or False For instance: python main.py --dataset_id=3 --task=multi --model_name=random_forest --seed=1 --compare=False