The goal of the project is to perform model selection and evaluation of a research work, for the Model Selection for Large Scale Learning course at Grenoble INP - Ensimag, year 2021/2022.
This directory is for simulating fair and robust sample selection on the synthetic dataset. The program needs PyTorch, Jupyter Notebook, and CUDA.
The directory contains a total of 6 files and 2 child directory:
- this README
- 4 python files:
FairRobustSampler.py
defines the FairRobust sampler and a PyTorch dataset for sensitive datamodels.py
contains logistic regression and SVM architecture, a test function and a plotting functionutils.py
contains utility functions for data generationmain.py
- a report as jupyter notebook
synthetic_data
contains 11 numpy files for synthetic data. The synthetic data is composed of training set, validation set, and test set.datasets
contains axls
file, related to the real credit card clients dataset
To simulate the algorithm, please use the jupyter notebook, which contains detailed instructions, or main.py
.
The jupyter notebook will load the data and train the models with two different fairness metrics: equalized odds and demographic parity.
Each training utilizes the FairRobust sampler. The PyTorch dataloader serves the batches to the model via the FairRobust sampler described in the paper. After the training, the test accuracy and fairness will be shown.