/DAVINZ-DataValuation

Training-free data valuation on deep neural network applications. (ICML-2022)

Primary LanguagePython

DAVINZ: Data Valuation using Deep Neural Networks at Initialization [ICML-2022]

This repository is the official implementation of the following paper accepted by the Thirty-ninth International Conference on Machine Learning (ICML) 2022:

Zhaoxuan Wu, Yao Shu, Bryan Kian Hsiang Low

DAVINZ: Data Valuation using Deep Neural Networks at Initialization

Requirements

To install requirements:

conda env create -f environment.yml

Preparing datasets

MNIST and CIFAR-10: The code automatically downloads the required datasets.

MNISTM: It can be downloaded here. Then, place the extracted keras_mnistm.pkl file under the data/ folder.

Ising Phyicial Model Dataset: It can be downloaded at here. Then, place the ising_data.h5 file under the data/ directory.

Run DAVINZ baseline experiments

At the beginning of the main.py and main_reg.py files, you can find example usages of DAVINZ for classficiation and regression tasks, respectively.

We give one example here:

mkdir data results checkpoints 
python main.py --dataset=MNIST_baseline --model=ResNet18 --num_parties=10 --split_method=by_class --seed=0 --gpu=0

Other methods

We implemented validation performance (VP), influence function (IF) and robust volume (RV) for comparisons. The code, including the example usages, can be found under the baselines/ directory.