Privacy Issues in Large Language Models

This repository is a collection of links to papers and code repositories relevant in implementing LLMs with reduced privacy risks. These correspond to papers discussed in our survey available at: https://arxiv.org/abs/2312.06717

This repository will be periodically updated with relevant papers scraped from Arxiv. The survey paper itself will be updated on a slightly less frequent basis. Papers that have been added to this repository but not the paper will be marked with asterisks.

If you have a paper relevant to LLM privacy, please nominate them for inclusion

Repo last updated 5/30/2024

Paper last updated 5/30/2024

Privacy Risks of LLMs
Additional Relevant Surveys
Contact Info

Citation

@misc{neel2023privacy,
      title={Privacy Issues in Large Language Models: A Survey}, 
      author={Seth Neel and Peter Chang},
      year={2023},
      eprint={2312.06717},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Memorization

Image from Carlini 2020

Paper Title	Year	Author	Code
Emergent and Predictable Memorization in Large Language Models	2023	Biderman et al.	[Code]
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks	2019	Carlini et al.
Quantifying Memorization Across Neural Language Models	2023	Carlini et al.
Do Localization Methods Actually Localize Memorized Data in LLMs?	2023	Chang et al.	[Code]
Does Learning Require Memorization? A Short Tale about a Long Tail	2020	Feldman et al.
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy	2023	Ippolito et al.
Measuring Forgetting of Memorized Training Examples	2023	Jagielski et al.
Deduplicating Training Data Mitigates Privacy Risks in Language Models	2022	Kandpal et al.
How BPE Affects Memorization in Transformers	2021	Kharitonov et al.
Deduplicating Training Data Makes Language Models Better	2022	Lee et al.	[Code]
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN	2021	McCoy et al.	[Code]
Training Production Language Models without Memorizing User Data	2020	Ramaswamy et al.
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks	2022	Raunak et al.
Understanding Unintended Memorization in Federated Learning	2020	Thakkar et al.
Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks	2020	Thomas et al.
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models	2022	Tirumala et al.
Counterfactual Memorization in Neural Language Models	2021	Zhang et al.
Provably Confidential Language Modelling	2022	Zhao et al.	[Code]
Quantifying and Analyzing Entity-level Memorization in Large Language Models	2023	Zhou et al.

Privacy Attacks

Image from Tindall

Paper Title	Year	Author	Code
Detecting Pretraining Data from Large Language Models	2023	Shi et al.
TMI! Finetuned Models Leak Private Information from their Pretraining Data	2023	Abascal et al.
Extracting Training Data from Large Language Models	2020	Carlini et al.	[Code]
Membership Inference Attacks From First Principles	2022	Carlini et al.	[Code]
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration	2023	Fu et al.
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?	2020	Hisamoto et al.	[Code]
Membership Inference Attacks on Machine Learning: A Survey	2021	Hu et al.
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?	2021	Lehman et al.	[Code]
MoPe: Model Perturbation-based Privacy Attacks on Language Models	2023	Li et al.
When Machine Learning Meets Privacy: A Survey and Outlook	2021	Liu et al.
Data Portraits: Recording Foundation Model Training Data	2023	Marone et al.	[Code]
Membership Inference Attacks against Language Models via Neighbourhood Comparison	2023	Mattern et al.
Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models	2023	Meeus et al.
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks	2022	Mireshghallah et al.	Contact fmireshg@eng.ucsd.edu
Scalable Extraction of Training Data from Production Language Models	2023	Nasr et al.
Detecting Pretraining Data from Large Language Models	2023	Shi et al.	[Code]
Membership Inference Attacks against Machine Learning Models	2017	Shokri et al.
Information Leakage in Embedding Models	2020	Song and Raghunathan
Auditing Data Provenance in Text-Generation Models	2019	Song and Shmatikov
Beyond Memorization: Violating Privacy Via Inference with Large Language Models	2023	Staab et al.	[Code]
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting	2018	Yeom et al.
Bag of Tricks for Training Data Extraction from Language Models	2023	Yu et al.	[Code]
Analyzing Information Leakage of Updates to Natural Language Models	2020	Zanella-Béguelin et al.
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation	2023	Zhang et al.	[Code]

Furthermore, see [Google Training Data Extraction Challenge]

Private LLMs

Image from Google AI Blog

Paper Title	Year	Author	Code
Deep Learning with Differential Privacy	2016	Abadi et al.
Large-Scale Differentially Private BERT	2021	Anil et al.
Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy	2023	Du et al.	[Code]
DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass	2023	Du et al.	[Code]
An Efficient DP-SGD Mechanism for Large Scale NLP Models	2022	Dupuy et al.
The Algorithmic Foundations of Differential Privacy	2006	Dwork and Roth
Submix: Practical Private Prediction for Large-Scale Language Models	2022	Ginart et al.
Federated Learning for Mobile Keyboard Prediction	2019	Hard et al.
Learning and Evaluating a Differentially Private Pre-trained Language Model	2021	Hoory et al.
Knowledge Sanitization of Large Language Models	2023	Ishibashi and Shimodaira
Differentially Private Language Models Benefit from Public Pre-training	2020	Kerrigan et al.	[Code]
Large Language Models Can Be Strong Differentially Private Learners	2022	Li et al.
Differentially Private Decoding in Large Language Models	2022	Majmudar et al.
Communication-Efficient Learning of Deep Networks from Decentralized Data	2016	McMahan et al.
Learning Differentially Private Recurrent Language Models	2018	McMahan et al.
Selective Differential Privacy for Language Modeling	2022	Shi et al.	[Code]
Training Production Language Models without Memorizing User Data	2020	Ramaswamy et al.
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation	2023	Tang et al.	[Code]
Understanding Unintended Memorization in Federated Learning	2020	Thakkar et al.
Differentially Private Fine-tuning of Language Models	2022	Yu et al.	[Code]
Provably Confidential Language Modelling	2022	Zhao et al.	[Code]
***Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory	2023	Mireshghallah et al.	[Code]
***Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models	2023	Zhou et al.

Unlearning

Image from Felps 2020

Paper Title	Year	Author	Code
Machine Unlearning	2020	Bourtoule et al.	[Code]
Unlearn What You Want to Forget: Efficient Unlearning for LLMs	2023	Chen et al.
Who's Harry Potter? Approximate Unlearning in LLMs	2023	Eldan et al.
Amnesiac Machine Learning	2020	Graves et al.
Adaptive Machine Unlearning	2021	Gupta et al.	[Code]
Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy	2024	Hayes et al.
Preserving Privacy Through DeMemorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models	2023	Kassem et al.
Privacy Adhering Machine Un-learning in NLP	2022	Kumar et al.
Knowledge Unlearning for Mitigating Privacy Risks in Language Models	2022	Jang et al.	[Code]
Unlearnable Algorithms for In-context Learning	2024	Muresanu et al.
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning	2020	Neel et al.
A Survey of Machine Unlearning	2022	Nguyen et al.
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks	2023	Patil et al.	[Code]
In-Context Unlearning: Language Models as Few Shot Unlearners	2023	Pawelczyk et al.
Large Language Model Unlearning	2023	Yao et al.	[Code]

Copyright

Custom Image

Paper Title	Year	Author	Code
Can Copyright be Reduced to Privacy?	2023	Elkin-Koren et al.
DeepCreativity: Measuring Creativity with Deep Learning Techniques	2022	Franceschelli et al.
Foundation Models and Fair Use	2023	Henderson et al.
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy	2023	Ippolito et al.
Copyright Violations and Large Language Models	2023	Karamolegkou et al.	[Code]
Formalizing Human Ingenuity: A Quantitative Framework for Copyright Law's Substantial Similarity	2022	Scheffler et al.
On Provable Copyright Protection for Generative Models	2023	Vyas et al.

Additional Related Surveys

Paper Title	Year	Author
Membership Inference Attacks on Machine Learning: A Survey	2022	Hu et al.
When Machine Learning Meets Privacy: A Survey and Outlook	2021	Liu et al.
Rethinking Machine Unlearning for Large Language Models	2024	Liu et al.
A Survey of Machine Unlearning	2022	Nguyen et al.
***A Survey of Large Language Models	2023	Zhao et al.

Contact Info

Repository maintained by Peter Chang (pchang@hbs.edu)

hitum-dev/survey-llm