/awesome-machine-unlearning

Awesome Machine Unlearning (A Survey of Machine Unlearning)

Primary LanguageJupyter NotebookMIT LicenseMIT

Awesome Machine Unlearning

Awesome arXiv Static Badge Website GitHub stars Hits

Contrib

A collection of academic articles, published methodology, and datasets on the subject of machine unlearning.

A sortable version is available here: https://awesome-machine-unlearning.github.io/

Please read and cite our paper: arXiv

Nguyen, T.T., Huynh, T.T., Nguyen, P.L., Liew, A.W.C., Yin, H. and Nguyen, Q.V.H., 2022. A Survey of Machine Unlearning. arXiv preprint arXiv:2209.02299.

Citation

@article{nguyen2022survey,
  title={A Survey of Machine Unlearning},
  author={Nguyen, Thanh Tam and Huynh, Thanh Trung and Nguyen, Phi Le and Liew, Alan Wee-Chung and Yin, Hongzhi and Nguyen, Quoc Viet Hung},
  journal={arXiv preprint arXiv:2209.02299},
  year={2022}
}

A Framework of Machine Unlearning

timeline


Existing Surveys

Paper Title Venue Year
Machine Unlearning: Solutions and Challenges arXiv 2023
Exploring the Landscape of Machine Unlearning: A Comprehensive Survey and Taxonomy arXiv 2023
Machine Unlearning: A Survey CSUR 2023
An Introduction to Machine Unlearning arXiv 2022
Machine Unlearning: Its Need and Implementation Strategies IC3 2021
Making machine learning forget Annual Privacy Forum 2019
“Amnesia” - A Selection of Machine Learning Models That Can Forget User Data Very Fast CIDR 2019
Humans forget, machines remember: Artificial intelligence and the Right to Be Forgotten Computer Law & Security Review 2018
Algorithms that remember: model inversion attacks and data protection law Philosophical Transactions of the Royal Society A 2018

Model-Agnostic Approaches

Model-Agnostic Model-agnostic machine unlearning methodologies include unlearning processes or frameworks that are applicable for different models. In some cases, they provide theoretical guarantees for only a class of models (e.g. linear models). But we still consider them model-agnostic as their core ideas are applicable to complex models (e.g. deep neural networks) with practical results.

Paper Title Year Author Venue Model Code Type
Tight Bounds for Machine Unlearning via Differential Privacy 2023 Huang et al. arXiv - -
Machine Unlearning Methodology base on Stochastic Teacher Network 2023 Zhang et al. arXiv - -
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening 2023 Foster et al. arXiv SSD [Code]
From Adaptive Query Release to Machine Unlearning 2023 Ullah et al. arXiv - - Exact Unlearning
Towards Adversarial Evaluations for Inexact Machine Unlearning 2023 Goel et al. arXiv EU-k, CF-k [Code]
KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment 2023 Wang et al. arXiv KGA [Code]
On the Trade-Off between Actionable Explanations and the Right to be Forgotten 2023 Pawelczyk et al. arXiv - -
Towards Unbounded Machine Unlearning 2023 Kurmanji et al. arXiv SCRUB [Code] approximate unlearning
Netflix and Forget: Efficient and Exact Machine Unlearning from Bi-linear Recommendations 2023 Xu et al. arXiv Unlearn-ALS - Exact Unlearning
To Be Forgotten or To Be Fair: Unveiling Fairness Implications of Machine Unlearning Methods 2023 Zhang et al. arXiv - [Code]
Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization 2022 Fraboni et al. arXiv SIFU -
Certified Data Removal in Sum-Product Networks 2022 Becker and Liebig ICKG UNLEARNSPN [Code] Certified Removal Mechanisms
Learning with Recoverable Forgetting 2022 Ye et al. ECCV LIRF -
Continual Learning and Private Unlearning 2022 Liu et al. CoLLAs CLPU [Code]
Verifiable and Provably Secure Machine Unlearning 2022 Eisenhofer et al. arXiv - [Code] Certified Removal Mechanisms
VeriFi: Towards Verifiable Federated Unlearning 2022 Gao et al. arXiv VERIFI - Certified Removal Mechanisms
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information 2022 Cao et al. S&P FedRecover - recovery method
Fast Yet Effective Machine Unlearning 2022 Tarun et al. arXiv UNSIR -
Membership Inference via Backdooring 2022 Hu et al. IJCAI MIB [Code] Membership Inferencing
Forget Unlearning: Towards True Data-Deletion in Machine Learning 2022 Chourasia et al. ICLR - - noisy gradient descent
Zero-Shot Machine Unlearning 2022 Chundawat et al. arXiv - -
Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations 2022 Guo et al. arXiv attribute unlearning -
Few-Shot Unlearning 2022 Yoon et al. ICLR - -
Federated Unlearning: How to Efficiently Erase a Client in FL? 2022 Halimi et al. UpML Workshop - - federated learning
Machine Unlearning Method Based On Projection Residual 2022 Cao et al. DSAA - - Projection Residual Method
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning 2022 Marchant et al. AAAI - [Code] Certified Removal Mechanisms
Athena: Probabilistic Verification of Machine Unlearning 2022 Sommer et al. PoPETs ATHENA -
FP2-MIA: A Membership Inference Attack Free of Posterior Probability in Machine Unlearning 2022 Lu et al. ProvSec FP2-MIA - inference attack
Deletion Inference, Reconstruction, and Compliance in Machine (Un)Learning 2022 Gao et al. PETS - -
Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization 2022 Zhang et al. NeurIPS PCMU - Certified Removal Mechanisms
The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining 2022 Liu et al. INFOCOM - [Code]
Backdoor Defense with Machine Unlearning 2022 Liu et al. INFOCOM BAERASER - Backdoor defense
Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten 2022 Nguyen et al. ASIA CCS MCU - MCMC Unlearning
Federated Unlearning for On-Device Recommendation 2022 Yuan et al. arXiv - -
Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks using an Incompetent Teacher 2022 Chundawat et al. arXiv - - Knowledge Adaptation
Efficient Two-Stage Model Retraining for Machine Unlearning 2022 Kim and Woo CVPR Workshop - -
Learn to Forget: Machine Unlearning Via Neuron Masking 2021 Ma et al. IEEE Forsaken - Mask Gradients
Adaptive Machine Unlearning 2021 Gupta et al. NeurIPS - [Code] Differential Privacy
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning 2021 Neel et al. ALT - - Certified Removal Mechanisms
Remember What You Want to Forget: Algorithms for Machine Unlearning 2021 Sekhari et al. NeurIPS - -
FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models 2021 Liu et al. IWQoS FedEraser -
Federated Unlearning 2021 Liu et al. IWQoS FedEraser [Code]
Machine Unlearning via Algorithmic Stability 2021 Ullah et al. COLT TV - Certified Removal Mechanisms
EMA: Auditing Data Removal from Trained Models 2021 Huang et al. MICCAI EMA [Code] Certified Removal Mechanisms
Knowledge-Adaptation Priors 2021 Khan and Swaroop NeurIPS K-prior [Code] Knowledge Adaptation
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models 2020 Wu et al. NeurIPS PrIU - Knowledge Adaptation
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks 2020 Golatkar et al. CVPR - - Certified Removal Mechanisms
Learn to Forget: User-Level Memorization Elimination in Federated Learning 2020 Liu et al. arXiv Forsaken -
Certified Data Removal from Machine Learning Models 2020 Guo et al. ICML - - Certified Removal Mechanisms
Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale 2020 Felps et al. arXiv - - Decremental Learning
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine 2019 Chen et al. Cluster Computing - - Decremental Learning
Making AI Forget You: Data Deletion in Machine Learning 2019 Ginart et al. NeurIPS - - Decremental Learning
Lifelong Anomaly Detection Through Unlearning 2019 Du et al. CCS - -
Learning Not to Learn: Training Deep Neural Networks With Biased Data 2019 Kim et al. CVPR - -
Efficient Repair of Polluted Machine Learning Systems via Causal Unlearning 2018 Cao et al. ASIACCS KARMA [Code]
Understanding Black-box Predictions via Influence Functions 2017 Koh et al. ICML - [Code] Certified Removal Mechanisms
Towards Making Systems Forget with Machine Unlearning 2015 Cao and Yang S&P -
Towards Making Systems Forget with Machine Unlearning 2015 Cao et al. S&P - - Statistical Query Learning
Incremental and decremental training for linear classification 2014 Tsai et al. KDD - [Code] Decremental Learning
Multiple Incremental Decremental Learning of Support Vector Machines 2009 Karasuyama et al. NIPS - - Decremental Learning
Incremental and Decremental Learning for Linear Support Vector Machines 2007 Romero et al. ICANN - - Decremental Learning
Decremental Learning Algorithms for Nonlinear Langrangian and Least Squares Support Vector Machines 2007 Duan et al. OSB - - Decremental Learning
Multicategory Incremental Proximal Support Vector Classifiers 2003 Tveit et al. KES - - Decremental Learning
Incremental and Decremental Proximal Support Vector Classification using Decay Coefficients 2003 Tveit et al. DaWak - - Decremental Learning
Incremental and Decremental Support Vector Machine Learning 2000 Cauwenberg et al. NeurIPS - - Decremental Learning

Model-Intrinsic Approaches

Model-Intrinsic The model-intrinsic approaches include unlearning methods designed for a specific type of models. Although they are model-intrinsic, their applications are not necessarily narrow, as many ML models can share the same type.

Paper Title Year Author Venue Model Code Type
Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning 2023 Zhu et al. WWW FedLU [Code] GNN-based Models
One-Shot Machine Unlearning with Mnemonic Code 2023 Yamashita arXiv One-Shot MU -
Inductive Graph Unlearning 2023 Wang et al. USENIX GUIDE [Code] GNN-based Models
ERM-KTP: Knowledge-level Machine Unlearning via Knowledge Transfer 2023 Lin et al. CVPR ERM-KTP [Code]
GNNDelete: A General Strategy for Unlearning in Graph Neural Networks 2023 Cheng et al. ICLR GNNDELETE [Code]
Unfolded Self-Reconstruction LSH: Towards Machine Unlearning in Approximate Nearest Neighbour Search 2023 Tan et al. arXiv USR-LSH [Code]
Efficiently Forgetting What You Have Learned in Graph Representation Learning via Projection 2023 Cong and Mahdavi AISTATS PROJECTOR [Code] GNN-based Models
Unrolling SGD: Understanding Factors Influencing Machine Unlearning 2022 Thudi et al. EuroS&P - [Code] SGD
Graph Unlearning 2022 Chen et al. CCS GraphEraser [Code] Graph Neural Networks
Certified Graph Unlearning 2022 Chien et al. GLFrontiers Workshop - [Code] Graph Neural Networks
Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification 2022 Bevan and Atapour-Abarghouei ICML - [Code] CNN Models
Near-Optimal Task Selection for Meta-Learning with Mutual Information and Online Variational Bayesian Unlearning 2022 Chen et al. AISTATS - - Bayensian Models
Unlearning Protected User Attributes in Recommendations with Adversarial Training 2022 Ganhor et al. SIGIR ADV-MULTVAE [Code] Autoencoder-based Model
Recommendation Unlearning 2022 Chen et al. TheWebConf RecEraser [Code] Attention-based Model
Knowledge Neurons in Pretrained Transformers 2022 Dai et al. ACL - [Code] Transformers
Memory-Based Model Editing at Scale 2022 Mitchell et al. MLR SERAC [Code] DNN-based Models
Forgetting Fast in Recommender Systems 2022 Liu et al. arXiv AltEraser - recommendation system
Unlearning Nonlinear Graph Classifiers in the Limited Training Data Regime 2022 Pan et al. arXiv - - GNN-based Models
Deep Regression Unlearning 2022 Tarun et al. arXiv Blindspot - Regression Model
Quark: Controllable Text Generation with Reinforced Unlearning 2022 Lu et al. arXiv Quark [Code] language models
Forget-SVGD: Particle-Based Bayesian Federated Unlearning 2022 Gong et al. DSL Workshop Forget-SVGD - Bayensian Models
Machine Unlearning of Federated Clusters 2022 Pan et al. arXiv SCMA - Federated clustering
Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach 2022 Zhang et al. MM - - DNN-based Models
Machine Unlearning: Linear Filtration for Logit-based Classifiers 2022 Baumhauer et al. Machine Learning normalizing filtration - Softmax classifiers
Deep Unlearning via Randomized Conditionally Independent Hessians 2022 Mehta et al. CVPR L-CODEC [Code] DNN-based Models
Challenges and Pitfalls of Bayesian Unlearning 2022 Rawat et al. UPML Workshop - - Bayesian Models
Federated Unlearning via Class-Discriminative Pruning 2022 Wang et al. WWW - - CNN-Based
Active forgetting via influence estimation for neural networks 2022 Meng et al. Int. J. Intel. Systems SCRUBBER - Neural Network
Variational Bayesian unlearning 2022 Nguyen et al. NeurIPS VI - Bayesian Models
Revisiting Machine Learning Training Process for Enhanced Data Privacy 2021 Goyal et al. IC3 - - DNN-based Models
Knowledge Removal in Sampling-based Bayesian Inference 2021 Fu et al. ICLR - [Code] Bayesian Models
Mixed-Privacy Forgetting in Deep Networks 2021 Golatkar et al. CVPR - - DNN-based Models
HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning 2021 Schelter et al. SIGMOD HedgeCut [Code] Tree-based Models
A Unified PAC-Bayesian Framework for Machine Unlearning via Information Risk Minimization 2021 Jose et al. MLSP PAC-Bayesian - Bayesian Models
DeepObliviate: A Powerful Charm for Erasing Data Residual Memory in Deep Neural Networks 2021 He et al. arXiv DEEPOBLIVIATE - DNN-based Models
Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations 2021 Izzo et al. AISTATS PRU [Code] Linear/Logistics models
Bayesian Inference Forgetting 2021 Fu et al. arXiv BIF [Code] Bayesian Models
Approximate Data Deletion from Machine Learning Models 2021 Izzo et al. AISTATS PRU [Code] Linear Models
Online Forgetting Process for Linear Regression Models 2021 Li et al. AISTATS FIFD-OLS - Linear Models
RevFRF: Enabling Cross-domain Random Forest Training with Revocable Federated Learning 2021 Liu et al. IEEE RevFRF - Random Forrests
Coded Machine Unlearning 2021 Aldaghri et al. IEEE Access - - Deep Learning Models
Machine Unlearning for Random Forests 2021 Brophy and Lowd ICML DaRE RF - Random Forrest
Bayesian Variational Federated Learning and Unlearning in Decentralized Networks 2021 Gong et al. SPAWC - - Bayesian Models
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations 2020 Golatkar et al. ECCV - - DNN-based Models
Influence Functions in Deep Learning Are Fragile 2020 Basu et al. arXiv - - DNN-based Models
Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference 2020 Zhang et al. IEEE DATM - Bayesian Models
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks 2020 Golatkar et al. CVPR - - DNN-based Models
Uncertainty in Neural Networks: Approximately Bayesian Ensembling 2020 Pearce et al. AISTATS - [Code] Bayesian Models
Certified Data Removal from Machine Learning Models 2020 Guo et al. ICML - - DNN-based Models
DeltaGrad: Rapid retraining of machine learning models 2020 Wu et al. ICML DeltaGrad [Code] DNN-based Models
Making AI Forget You: Data Deletion in Machine Learning 2019 Ginart et al. NeurIPS - - Linear Models
“Amnesia” – Towards Machine Learning Models That Can Forget User Data Very Fast 2019 Schelter AIDB Workshop - [Code] Collaborative Filtering
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine 2019 Chen et al. Cluster Computing - - SVM
Neural Text Degeneration With Unlikelihood Training 2019 Welleck et al. arXiv unlikelihood training [Code] DNN-based
Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes 2018 Roth et al. IEEE DP [Code] Bayesian Models

Data-Driven Approaches

Data-Driven The approaches fallen into this category use data partition, data augmentation and data influence to speed up the retraining process. Methods of attack by data manipulation (e.g. data poisoning) are also included for reference.

Paper Title Year Author Venue Model Code Type
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks 2022 Di et al. NeurIPS-TSRML - [Code] Data Poisoning
Forget Unlearning: Towards True Data Deletion in Machine Learning 2022 Chourasia et al. ICLR - - Data Influence
ARCANE: An Efficient Architecture for Exact Machine Unlearning 2022 Yan et al. IJCAI ARCANE - Data Partition
PUMA: Performance Unchanged Model Augmentation for Training Data Removal 2022 Wu et al. AAAI PUMA - Data Influence
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study 2022 Mahadevan and Mathioudakis MAKE - [Code] Data Influence
Zero-Shot Machine Unlearning 2022 Chundawat et al. arXiv - - Data Influence
GRAPHEDITOR: An Efficient Graph Representation Learning and Unlearning Approach 2022 Cong and Mahdavi - GRAPHEDITOR [Code] Data Influence
Fast Model Update for IoT Traffic Anomaly Detection with Machine Unlearning 2022 Fan et al. IEEE IoT-J ViFLa - Data Partition
Learning to Refit for Convex Learning Problems 2021 Zeng et al. arXiv OPTLEARN - Data Influence
Fast Yet Effective Machine Unlearning 2021 Ayush et al. arXiv - - Data Augmentation
Learning with Selective Forgetting 2021 Shibata et al. IJCAI - - Data Augmentation
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models 2021 Peste et al. NeurIPS-PRIML SSSE - Data Influence
How Does Data Augmentation Affect Privacy in Machine Learning? 2021 Yu et al. AAAI - [Code] Data Augmentation
Coded Machine Unlearning 2021 Aldaghri et al. IEEE - - Data Partitioning
Machine Unlearning 2021 Bourtoule et al. IEEE SISA [Code] Data Partitioning
How Does Data Augmentation Affect Privacy in Machine Learning? 2021 Yu et al. AAAI - [Code] Data Augmentation
Amnesiac Machine Learning 2021 Graves et al. AAAI AmnesiacML [Code] Data Influence
Unlearnable Examples: Making Personal Data Unexploitable 2021 Huang et al. ICLR - [Code] Data Augmentation
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning 2021 Neel et al. ALT - - Data Influence
Fawkes: Protecting Privacy against Unauthorized Deep Learning Models 2020 Shan et al. USENIX Sec. Sym. Fawkes [Code] Data Augmentation
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models 2020 Wu et al. SIGMOD PrIU/PrIU-opt - Data Influence
DeltaGrad: Rapid retraining of machine learning models 2020 Wu et al. ICML DeltaGrad [Code] Data Influence

Datasets

Type: Image

Dataset #Items Disk Size Downstream Application #Papers Used
MNIST 70K 11MB Classification 29+ papers
CIFAR 60K 163MB Classification 16+ papers
SVHN 600K 400MB+ Classification 8+ papers
LSUN 69M+ 1TB+ Classification 1 paper
ImageNet 14M+ 166GB Classification 6 papers

Type: Tabular

Dataset #Items Disk Size Downstream Application #Papers Used
Adult 48K+ 10MB Classification 8+ papers
Breast Cancer 569 <1MB Classification 2 papers
Diabetes 442 <1MB Regression 3 papers

Type: Text

Dataset #Items Disk Size Downstream Application #Papers Used
IMDB Review 50k 66MB Sentiment Analysis 1 paper
Reuters 11K+ 73MB Categorization 1 paper
Newsgroup 20K 1GB+ Categorization 1 paper

Type: Sequence

Dataset #Items Disk Size Downstream Application #Papers Used
Epileptic Seizure 11K+ 7MB Timeseries Classification 1 paper
Activity Recognition 10K+ 26MB Timeseries Classification 1 paper
Botnet 72M 3GB+ Clustering 1 paper

Type: Graph

Dataset #Items Disk Size Downstream Application #Papers Used
OGB 100M+ 59MB Classification 2 papers
Cora 2K+ 4.5MB Classification 3 papers
MovieLens 1B+ 3GB+ Recommender Systems 1 paper

Evaluation Metrics

Metrics Formula/Description Usage
Accuracy Accuracy on unlearned model on forget set and retrain set Evaluating the predictive performance of unlearned model
Completeness The overlapping (e.g. Jaccard distance) of output space between the retrained and the unlearned model Evaluating the indistinguishability between model outputs
Unlearn time The amount of time of unlearning request Evaluating the unlearning efficiency
Relearn Time The epochs number required for the unlearned model to reach the accuracy of source model Evaluating the unlearning efficiency (relearn with some data sample)
Layer-wise Distance The weight difference between original model and retrain model Evaluate the indistinguishability between model parameters
Activation Distance An average of the L2-distance between the unlearned model and retrained model’s predicted probabilities on the forget set Evaluating the indistinguishability between model outputs
JS-Divergence Jensen-Shannon divergence between the predictions of the unlearned and retrained model Evaluating the indistinguishability between model outputs
Membership Inference Attack Recall (#detected items / #forget items) Verify the influence of forget data on the unlearned model
ZRF score $\mathcal{ZRF} = 1 - \frac{1}{nf}\sum\limits_{i=0}^{n_f} \mathcal{JS}(M(x_i), T_d(x_i))$ The unlearned model should not intentionally give wrong output $(\mathcal{ZRF} = 0)$ or random output $(\mathcal{ZRF} = 1)$ on the forget item
Anamnesis Index (AIN) $AIN = \frac{r_t (M_u, M_{orig}, \alpha)}{r_t (M_s, M_{orig}, \alpha)}$ Zero-shot machine unlearning
Epistemic Uncertainty if $\mbox{i(w;D) &gt; 0}$, then $\mbox{efficacy}(w;D) = \frac{1}{i(w; D)}$;
otherwise $\mbox{efficacy}(w;D) = \infty$
How much information the model exposes
Model Inversion Attack Visualization Qualitative verifications and evaluations

Disclaimer

Feel free to contact us if you have any queries or exciting news on machine unlearning. In addition, we welcome all researchers to contribute to this repository and further contribute to the knowledge of machine unlearning fields.

If you have some other related references, please feel free to create a Github issue with the paper information. We will glady update the repos according to your suggestions. (You can also create pull requests, but it might take some time for us to do the merge)

HitCount visitors