Potion: Towards Poison Unlearning

ArXiv: Potion: Towards Poison Unlearning

Our work is based on the benchmarking environment of Corrective Machine Unlearning

Abstract

Adversarial attacks by malicious actors on machine learning systems, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. This necessitates the development of methods to remove, i.e. unlearn, poison triggers from already trained models with only a subset of the poison data available. The requirements for this task significantly deviate from privacy-focused unlearning where all of the data to be forgotten by the model is known. Previous work has shown that the undiscovered poisoned samples lead to a failure of established unlearning methods, with only one method, Selective Synaptic Dampening (SSD), showing limited success. Even full retraining, after the removal of the identified poison, cannot address this challenge as the undiscovered poison samples lead to a reintroduction of the poison trigger in the model. Our work addresses two key challenges to advance the state of the art in poison unlearning. First, we introduce a novel outlier-resistant SSD-based method to improve model protection and unlearning performance simultaneously. Second, we introduce Poison Trigger Neutralisation (PTN) search. A fast, parallelisable, hyperparameter search that utilises the characteristic "unlearning versus model protection" trade-off to find suitable hyperparameters in settings where the forget set size is unknown and the retain set is contaminated. We benchmark our contributions using ResNet-9 on CIFAR10 and WideResNet-28x10 on CIFAR100 with 0.2%, 1%, and 2% of the data poisoned and discovery shares ranging from a single sample to 100%. Experimental results show that our method heals 93.72% of poison compared to SSD with 83.41% and full retraining with 40.68%. We achieve this while also lowering the average model accuracy drop caused by unlearning from 5.68% (SSD) to 1.41% (ours).

Using this repo

The .sh files are used to perform the experiments. Most parameters (e.g. $\rho$) can be changed directly in the .sh files, while others such as $s_{step}$ and $s_{start}$ are set in main.py and methods.py. We have set up logging using weights and biases. Feel free to use alternative loggers.

Citing this work

@misc{schoepf2024potion,
      title={Potion: Towards Poison Unlearning}, 
      author={Stefan Schoepf and Jack Foster and Alexandra Brintrup},
      year={2024},
      eprint={2406.09173},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}

Our related research

Paper	Code	Venue/Status
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening	GitHub	AAAI 2024
Loss-Free Machine Unlearning (i.e. Label-Free)	GitHub	ICLR 2024 Tiny Paper
Parameter-Tuning-Free Data Entry Error Unlearning with Adaptive Selective Synaptic Dampening	GitHub	Preprint
Zero-Shot Machine Unlearning at Scale via Lipschitz Regularization	GitHub	Preprint

if-loops/towards_poison_unlearning

Potion: Towards Poison Unlearning

Abstract

Using this repo

Citing this work

Our related research