OpenFE_reproduce: A Jupyter Notebook repository from ZhangTP1996

Environment Setup

Install anaconda

export PROJECT_DIR=<ABSOLUTE path to the repository root>
conda create -n OpenFE python=3.8.12
conda activate OpenFE
conda env config vars set PYTHONPATH=${PYTHONPATH}:${PROJECT_DIR}
conda env config vars set PROJECT_DIR=${PROJECT_DIR}
conda env config vars set LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
conda deactivate
conda activate OpenFE
python -m pip install -r requirements.txt --no-deps

Data Download

Part 1: Kaggle data
- Prepare data of IEEE
  - Download link: IEEE-CIS Fraud Detection | Kaggle (There is a Download All button)
  - unzip and make sure there exists
    - ./data/IEEE/train_identity.csv
    - ./data/IEEE/train_transaction.csv
    - ./data/IEEE/test_identity.csv
    - ./data/IEEE/test_transaction.csv
    - ./data/IEEE/sample_submission.csv
- Prepare data of BNP
  - Download link: BNP Paribas Cardif Claims Management | Kaggle (There is a Download All button)
  - unzip and make sure there exists
    - ./data/BNP/train.csv.zip
    - ./data/BNP/test.csv.zip
    - ./data/BNP/sample_submission.csv.zip
Part 2: other data
- Download link: https://www.dropbox.com/s/8tj5ln7wz1r9arc/data.zip?dl=1
- Unzip and move the files so that there exists
  - ./data/{dataset}/*.npy

Experiment

Part 1: Kaggle experiment (Table 5 in our paper)
- IEEE Experiment
  - Make sure you are in the folder run_IEEE.
  - bash IEEE.sh
  - Output is the file run_IEEE/results/sub_xgb_OpenFE_*_order.csv.
  - Submit link: IEEE-CIS Fraud Detection | Kaggle
- BNP Experiment
  - Make sure you are in the folder run_BNP.
  - bash BNP.sh
  - Outputs are in the folder run_BNP/result/. To evaluate them, submit them to the link below.
  - Submit link: BNP Paribas Cardif Claims Management | Kaggle
Part 2: other experiments (Table 3 in our paper)
- Reproduce results of OpenFE
  - Run a single dataset (e.g. california_housing)
    - bash shell_inst/california_housing.sh
  - You can find results in OpenFE-california_housing.log
  - You can also find results in the folder runs/output/{dataset}/lightgbm/tuned
    - There are two files in the folder.
      - result shows the test value under corresponding metric.
      - stats.json shows more details
- Reproduce results of baseline methods
  - We run SAFE on the Diabetes dataset as an example. Running other methods on other datasets only require changing the arguments.
  - python baseline/run_methods.py --method safe --data diabetes --task classification --n_new_features 10 --n_jobs 8
  - python eval.py --data diabetes --model lightgbm --model_type tuned --task_type classification --algorithm safe --n_saved_features 10

Acknowledgement

rtdl: We use their codes for model training.

Structure

root:[demo]
+--data                           The folder of data.
|      +--BNP
|      ...
+--FeatureGenerator.py            Imported by OpenFE for calculating features.
+--OpenFE.py                      This is a bit different from the open-sourced package.
+--readme.md                      Guide.
+--requirements.txt
+--runs                           This folder is for other experiment.
|      +--bin
|      +--clear.sh                Remove all output files. (Including results.)
|      +--eval.py                 Train models to evaluate new features.
|      +--FE_first_order.py       Generate first order features.
|      +--FE_high_order.py        Generate second order features.
|      +--lib
|      +--nn_utils.py
|      +--run_all.py              Automatically run all experiments.
|      +--shell_inst              Experiment for a specific dataset.
|      |      +--nomao.sh
|      |      ...
|      +--tuned_parameters        This folder contains the tuned parameters.
|      +--tune_parameter.py
|      +--baseline                     This folder contains all the baseline methods we reproduce.
+--run_BNP
|      +--BNP.sh                  Automatically run BNP experiments.
|      +--eval_first_order.py     
|      +--eval_high_order.py
|      +--FE_first_order.py
|      +--FE_high_order.py
|      +--result
+--run_IEEE
|      +--IEEE.sh                 Automatically run IEEE experiments.
|      +--IEEE_utils.py
|      +--main.py
|      +--results
+--utils.py                       Utils imported by OpenFE.

ZhangTP1996/OpenFE_reproduce

Environment Setup

Data Download

Experiment

Acknowledgement

Structure