Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DAGnosis: Localized Identification of Data Inconsistencies using Structures

This repository accompanies the AISTATS'24 paper: "DAGnosis: Localized Identification of Data Inconsistencies using Structures".


We suggest creating a new environment before using the code, e.g. with:

conda create --name dagnosis python=3.10

We can then install the package from source:

pip install .


We illustrate how to use DAGnosis in a synthetic setup, via the files in the folder experiments/synthetic. The bash scripts run_linear.sh and run_mlp.sh run the full pipeline: generate the data, train the conformal estimators, and test the conformal estimators, for linear and MLP SEMs respectively. The bash commands for these must be run from inside the experiments/synthetic directory.

To compute the inconsistency detection metrics (F1, Precision, Recall), go to the folder experiments/synthetic and run:

python compute_metrics.py PATH_SAVE_METRIC=path_metrics

where path_metrics denotes the folder where the metrics are saved.

Similarly, you can reproduce the sensitivity experiment by going to the folder experiments/synthetic/sensitivity and using the script run.sh, followed by

python compute_metrics.py PATH_SAVE_METRIC=path_metrics

UCI Adult Income

To run the experiments on the UCI Adult Income dataset, go to the folder experiments/adult. In order to train and test the conformal estimators, run

python train_test_adult.py

The artifacts will be saved in the folder artifacts_adult. Then, the results can be obtained by executing:

python proportion_flagging.py

which will print the list of downstream accuracies and proportions of samples flagged (Figure 3 a) and b)).


If you use this software, please cite the original paper: