Awesome-GenAI-Unlearning

This repository contains a list of papers on Generative AI Machine Unlearning based on our survey paper: Machine Unlearning in Generative AI: A Survey (Zheyuan (Frank) Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian and Meng Jiang). We categorize existing works based on their modality, and applications. Additionally, we include datasets and benchmarks for various unlearning scenarios.

Awesome-GenAI-Unlearning

Datasets, Benchmarks:

Datasets:

Safety Alignment

LAION LAION-400-MILLION OPEN DATASET (code)
Civil Comments [CoRR 2019] Nuanced metrics for measuring unintended bias with real data for text classification (code)
PKU-SafeRLHF [arxiv 2310.12773] Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback (code)
Anthropic red team [arxiv 2204.05862] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback (code)

Copyrights Protection

Harry Potter Copyright issue, cannot be disclosed.
Bookcorpus [arxiv 1506.06724] Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books (code)
TOFU [arxiv 2401.06121] TOFU: A Task of Fictitious Unlearning for LLMs (code)

Hallucination Reduction

HaluEVAL [EMNLP 2023] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models (code)
TruthfulQA [ACL 2023] TruthfulQA: Measuring How Models Mimic Human Falsehoods (code)
CounterFact [NeurIPS 2022] Locating and Editing Factual Associations in GPT (code)
ZsRE [CoNLL 2017] Zero-Shot Relation Extraction via Reading Comprehension (code)
MSCOCO [arxiv 1405.0312] Microsoft COCO: Common Objects in Context (code)

Privacy Compliance

Pile [arxiv 2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling (code)
Yelp/Amazon Reviews (code)
SST-2 [EMNLP 2013] Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
PersonaChat [arxiv 1801.07243] Personalizing Dialogue Agents: I have a dog, do you have pets too? (code)
LEDGAR [ACL 2020] LEDGAR: A Large-Scale Multilabel Corpus for Text Classification of Legal Provisions in Contracts (code)
SAMsum [ACL 2019] SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization (code)
IMDB (code)
CeleA-HQ [Neurips 2018] IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis (code)
I2P Inappropriate Image Prompts (I2P) Benchmark (code)

Bias/Unfairness Alleviation

StereoSet [ACL 2021] StereoSet: Measuring stereotypical bias in pretrained language models (code)
HateXplain [AAAI 2021] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection (code)
CrowS Pairs [EMNLP 2021] CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models (code)

Benchmarks:

Generative Image Models

UnlearnCanvas [arxiv 2402.11846] UnlearnCanvas: A Stylized Image Dataaset to Benchmark Machine Unlearning for Diffusion Models (code)

LLMs

TOFU [arxiv 2401.06121] TOFU: A Task of Fictitious Unlearning for LLMs (code)
WMDP [arxiv 2403.03218] The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning (code)

LMMs

Object HalBench: [EMNLP 2018] Object Hallucination in Image Captioning (code)
MHumanEval: [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback (code)
LLaVA Bench: [Neurips 2023 (oral)] Visual Instruction Tuning (code)
MMHal-Bench: Aligning Large Multimodal Models with Factually Augmented RLHF (code)
POPE: [EMNLP 2023] POPE: Polling-based Object Probing Evaluation for Object Hallucination (code)

Generative Image Models:

[202401] Erasediff: Erasing data influence in diffusion models (PDF)
[ICLR 2024] Machine Unlearning for Image-to-Image Generative Models (PDF, code)
[ICLR 2024] SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation (PDF, code)
[ICCV 2023] Ablating Concepts in Text-to-Image Diffusion Models (PDF, code)
[202312] FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models (PDF, code)
[202311] Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers (PDF)
[202310] Feature Unlearning for Pre-trained GANs and VAEs (PDF)
[202310] To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now (PDF, code)
[202309] Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks (PDF)
[202308] Generative Adversarial Networks Unlearning (PDF)
[202306] Training data attribution for diffusion models (PDF, code)
[202305] Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models (PDF, code)
[202303] Erasing Concepts from Diffusion Models (PDF, code)
[202303] Forget-me-not: Learning to forget in text-to-image diffusion models (PDF, code)

Large Language Models (LLMs):

【202406】Large Language Model Unlearning via Embedding-Corrupted Prompts [PDF, code]
【202406】REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space [PDF, code]
【202406】Soft Prompting for Unlearning in Large Language Models (PDF, code)
【202406】Avoiding Copyright Infringement via Machine Unlearning (PDF, code)
【202405】Cross-Modal Safety Alignment: Is textual unlearning all you need? [PDF]
【202405】Large Scale Knowledge Washing [PDF, code]
【202405】Machine Unlearning in Large Language Models [PDF]
【202404】Offset Unlearning for Large Language Models [PDF]
【202404】Exact and Efficient Unlearning for Large Language Model-based Recommendation [PDF]
【202404】Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning [PDF]
【202404】Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge [PDF]
【202404】Digital Forgetting in Large Language Models: A Survey of Unlearning Methods [PDF]
【202403】The Frontier of Data Erasure: Machine Unlearning for Large Language Models [PDF]
【ICML 2024】Larimar: Large Language Models with Episodic Memory Control. (PDF, code)
【202403】Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [PDF]
[202403] Dissecting Language Models: Machine Unlearning via Selective Pruning (PDF, code)
【202403】Guardrail Baselines for Unlearning in LLMs [PDF]
【202403】Towards Efficient and Effective Unlearning of Large Language Models for Recommendation [PDF]
【202403】The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning [PDF]
【202402】Eight Methods to Evaluate Robust Unlearning in LLMs [PDF]
【ACL 2024】Machine Unlearning of Pre-trained Large Language Models [PDF]
【202402】EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models [PDF]
【202402】Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination [PDF]
【ACL 2024】Towards Safer Large Language Models through Machine Unlearning [PDF, code]
【202402】Rethinking Machine Unlearning for Large Language Models [PDF]
【202402】Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models [PDF]
【202401】Unlearning Reveals the Influential Training Data of Language Models [PDF]
【202401】TOFU: A Task of Fictitious Unlearning for LLMs [PDF]
【202312】Learning and Forgetting Unsafe Examples in Large Language Models [PDF]
【NeurIPS2023 Workshop】FAIRSISA: ENSEMBLE POST-PROCESSING TO IMPROVE FAIRNESS OF UNLEARNING IN LLMS [PDF]
【202311】Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges [PDF]
【202311】Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models [PDF]
【202311】Making Harmful Behaviors Unlearnable for Large Language Models [PDF]
【EMNLP 2023】Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models (PDF)
【EMNLP 2023】Unlearn What You Want to Forget: Efficient Unlearning for LLMs [PDF]
【202310】DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models (PDF)
【202310】Large Language Model Unlearning (PDF, code)
【202310】In-Context Unlearning: Language Models as Few Shot Unlearners (PDF, code)
【202310】Who’s Harry Potter? Approximate Unlearning in LLMs (PDF)
【202309】 Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble (PDF)
【202309】Neural Code Completion Tools Can Memorize Hard-coded Credentials (PDF, code)
【202308】Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation (PDF, code)
【202307】Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data (PDF)
【202307】What can we learn from Data Leakage and Unlearning for Law? (PDF)
【202306】Composing Parameter-Efficient Modules with Arithmetic Operations (PDF)
【202305】KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment (PDF)
【202305】Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions (PDF)
【202302】Knowledge Unlearning for Mitigating Privacy Risks in Language Models (PDF, code)
【ACL2023】Unlearning Bias in Language Models by Partitioning Gradients (PDF, code)
【202212】Privacy Adhering Machine Un-learning in NLP (PDF)
【NeurIPS 2022】Quark: Controllable Text Generation with Reinforced Unlearning (PDF)
【ACL 2022】Knowledge Neurons in Pretrained Transformers (PDF, code)
【NeurIPS 2022】Editing Models with Task Arithmetic (PDF, code)
【CCS 2020】Analyzing Information Leakage of Updates to Natural Language Models (PDF)

Large Multimodal Models (LMMs):

[202406] MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning (PDF, code)
[202405] Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models (PDF)
[202403] Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning (PDF)
[202402] EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models (PDF)
[202311] MultiDelete for Multimodal Machine Unlearning (PDF)

Applications:

Safety Alignment:

【202310】Large Language Model Unlearning (PDF, code)
【202404】Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge [PDF]
【202401】Unlearning Reveals the Influential Training Data of Language Models [PDF]
【ICLR 2024】 SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation (PDF, code)
【202305】 Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models (PDF, code)
【202401】 Erasediff: Erasing data influence in diffusion models (PDF)
【202303】 Erasing Concepts from Diffusion Models (PDF, code)
【202312】Learning and Forgetting Unsafe Examples in Large Language Models [PDF]
【202311】 Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers (PDF)
【EMNLP 2023】Unlearn What You Want to Forget: Efficient Unlearning for LLMs [PDF]
【ACL 2024】Towards Safer Large Language Models through Machine Unlearning [PDF, code]
【NeurIPS 2022】Editing Models with Task Arithmetic (PDF, code)
【202308】Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation (PDF, code)
【202306】Composing Parameter-Efficient Modules with Arithmetic Operations (PDF)
【202309】 Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks (PDF)

Copyright Protection:

【202406】Avoiding Copyright Infringement via Machine Unlearning (PDF, code)
【202302】Knowledge Unlearning for Mitigating Privacy Risks in Language Models (PDF, code)
【202310】Large Language Model Unlearning (PDF, code)
【202310】Who’s Harry Potter? Approximate Unlearning in LLMs (PDF)
【202303】Forget-me-not: Learning to forget in text-to-image diffusion models (PDF, code)

Hallucination Reduction:

【202310】Large Language Model Unlearning (PDF, code)
【202311】MultiDelete for Multimodal Machine Unlearning (PDF)
【202401】Unlearning Reveals the Influential Training Data of Language Models [PDF]
【202402】EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models (PDF)
【202405】Large Scale Knowledge Washing [PDF, code]
【202308】Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation (PDF, code)
【ICML 2024】Larimar: Large Language Models with Episodic Memory Control. (PDF, code)

Privacy Compliance:

【202311】 MultiDelete for Multimodal Machine Unlearning (PDF)
【202302】Knowledge Unlearning for Mitigating Privacy Risks in Language Models (PDF, code)
【202310】Large Language Model Unlearning (PDF, code)
【202404】Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning [PDF]
【202403】Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [PDF]
【202307】Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data (PDF)
【202305】 Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models (PDF, code)
【ICLR 2024】Machine Unlearning for Image-to-Image Generative Models (PDF, code)
【202308】Generative Adversarial Networks Unlearning (PDF)
【202309】Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks (PDF)
【202310】Feature Unlearning for Pre-trained GANs and VAEs (PDF)
【EMNLP 2023】Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models (PDF)
【202303】Forget-me-not: Learning to forget in text-to-image diffusion models (PDF, code)
【202402】Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models [PDF]
【NeurIPS 2022】Quark: Controllable Text Generation with Reinforced Unlearning (PDF)
【202402】Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination [PDF]
【202404】Offset Unlearning for Large Language Models [PDF]
【202305】KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment (PDF)
【202309】Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble (PDF)
【202306】Training data attribution for diffusion models (PDF, code)
【EMNLP 2023】Unlearn What You Want to Forget: Efficient Unlearning for LLMs [PDF]
【202212】Privacy Adhering Machine Un-learning in NLP (PDF)
【202311】 Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers (PDF)
【202310】In-Context Unlearning: Language Models as Few Shot Unlearners (PDF, code)

Bias/Unfairness Alleviation:

【ACL2023】Unlearning Bias in Language Models by Partitioning Gradients (PDF, code)
【202401】Unlearning Reveals the Influential Training Data of Language Models [PDF]
【NeurIPS2023 Workshop】FAIRSISA: ENSEMBLE POST-PROCESSING TO IMPROVE FAIRNESS OF UNLEARNING IN LLMS [PDF]

Other Surveys:

Eight Methods to Evaluate Robust Unlearning in LLMs (PDF)
Rethinking Machine Unlearning for Large Language Models. (PDF)
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods. (PDF)
Knowledge unlearning for llms: Tasks, methods, and challenges. (PDF)
Copyright Protection in Generative AI: A Technical Perspective. (PDF)
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey. (PDF)
Right to be forgotten in the era of large language models: Implications, challenges, and solutions. (PDF)
Threats, attacks, and defenses in machine unlearning: A survey. (PDF)

Contributing:

👍 Contributions to this repository are welcome! We will try to make this list updated. If you find any error or any missed paper, please don't hesitate to open an issue or pull request.

chrisliu298/Awesome-GenAI-Unlearning