This repository provides an implementation of the paper Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning accepted at AISTATS 2022 as oral presentation. We propose a noise-reduced data valuation method, Beta Shapley, which is powerful at capturing the importance of data points.
We provide a notebook using the Covertype dataset. It shows how to compute the Beta Shapley value and its application on several downstream ML tasks.
--> Beta Shapley can identify noisy samples by focusing marginal contributions on small cardinalities. --> Beta Shapley on the CIFAR100 test dataset. Mislabeled data points have negative Beta Shapley values, meaning they actually harm the model performance. Beta Shapley can detect mislabeled points.betashap/ShapEngine.py
: main class for computing Beta-Shapley.
betashap/data.py
: handles loading and preprocessing datasets.