distributional-fairness

To reproduce figures and experiments:

run ./run_initial.sh (or for a single dataset/algorithm, python run_initial.py with requisite cli args). This calculates the barycenter adjustments and saves the resulting .csvs.
run ./run_process.sh (or for a single dataset/algorithm, python run_process.py with requisite cli args). This calculates $\lambda$s and evaluates distributional parity for each, and saves the resulting .csvs.
run plot.py to generate plots illustrating distributional parity (figs 2 and 3).

To reproduce baselines:

in fair_baselines folder, run (e.g.) feldman.py to generate predicted probabilities with the corresponding fair baseline (+ any hp tuning) applied.
generate plots using baselines_overthresholds.ipynb.

datasets.py: get predicted probabilities (for binary sens attr datasets) of a baseline algorithm.
- To add a dataset, follow template in get_new_adult() to implement something similar (eg get_<DATASET_NAME>()).
- This method should have a flag for interv, which will check if we are running a baseline fairness algorithm or just ours.
- If interv is not None, then you should construct a BinaryLabelDataset (following AIF360 convention) and return it (see current L259).
- The rest should be pretty similar to what's already in get_new_adult(), just make sure splits and sensitive features are set properly.
eval_helpers.py: mainly get_eval_single, which gives results for all metrics over all thresholds at a single lambda. also has get_prob_lambda, which is probabilistic estimation of (binary) lambda.

bcmap.py: calculate adjustment to the barycenter (multi-group)
bin_postprocess.py: calculate adjustment to the barycenter (binary-group)
exact_solver.py: exact calculation for lambda (binary-group). only used in lambda plotting notebook

jessica-dai/distributional-fairness