/survey-llm

A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code

Primary LanguagePython

Privacy Issues in Large Language Models

This repository is a collection of links to papers and code repositories relevant in implementing LLMs with reduced privacy risks. These correspond to papers discussed in our survey available at: https://arxiv.org/abs/2312.06717

This repository will be periodically updated with relevant papers scraped from Arxiv. The survey paper itself will be updated on a slightly less frequent basis. Papers that have been added to this repository but not the paper will be marked with asterisks.

If you have a paper relevant to LLM privacy, please nominate them for inclusion

Repo last updated 5/30/2024

Paper last updated 5/30/2024

Table of Contents

Citation

@misc{neel2023privacy,
      title={Privacy Issues in Large Language Models: A Survey}, 
      author={Seth Neel and Peter Chang},
      year={2023},
      eprint={2312.06717},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Memorization

drawing

Image from Carlini 2020

Paper Title Year Author Code
Emergent and Predictable Memorization in Large Language Models 2023 Biderman et al. [Code]
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks 2019 Carlini et al.
Quantifying Memorization Across Neural Language Models 2023 Carlini et al.
Do Localization Methods Actually Localize Memorized Data in LLMs? 2023 Chang et al. [Code]
Does Learning Require Memorization? A Short Tale about a Long Tail 2020 Feldman et al.
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy 2023 Ippolito et al.
Measuring Forgetting of Memorized Training Examples 2023 Jagielski et al.
Deduplicating Training Data Mitigates Privacy Risks in Language Models 2022 Kandpal et al.
How BPE Affects Memorization in Transformers 2021 Kharitonov et al.
Deduplicating Training Data Makes Language Models Better 2022 Lee et al. [Code]
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN 2021 McCoy et al. [Code]
Training Production Language Models without Memorizing User Data 2020 Ramaswamy et al.
Finding Memo: Extractive Memorization in Constrained Sequence Generation Tasks 2022 Raunak et al.
Understanding Unintended Memorization in Federated Learning 2020 Thakkar et al.
Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks 2020 Thomas et al.
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models 2022 Tirumala et al.
Counterfactual Memorization in Neural Language Models 2021 Zhang et al.
Provably Confidential Language Modelling 2022 Zhao et al. [Code]
Quantifying and Analyzing Entity-level Memorization in Large Language Models 2023 Zhou et al.

Privacy Attacks

image info

Image from Tindall

Paper Title Year Author Code
Detecting Pretraining Data from Large Language Models 2023 Shi et al.
TMI! Finetuned Models Leak Private Information from their Pretraining Data 2023 Abascal et al.
Extracting Training Data from Large Language Models 2020 Carlini et al. [Code]
Membership Inference Attacks From First Principles 2022 Carlini et al. [Code]
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration 2023 Fu et al.
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? 2020 Hisamoto et al. [Code]
Membership Inference Attacks on Machine Learning: A Survey 2021 Hu et al.
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? 2021 Lehman et al. [Code]
MoPe: Model Perturbation-based Privacy Attacks on Language Models 2023 Li et al.
When Machine Learning Meets Privacy: A Survey and Outlook 2021 Liu et al.
Data Portraits: Recording Foundation Model Training Data 2023 Marone et al. [Code]
Membership Inference Attacks against Language Models via Neighbourhood Comparison 2023 Mattern et al.
Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models 2023 Meeus et al.
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks 2022 Mireshghallah et al. Contact fmireshg@eng.ucsd.edu
Scalable Extraction of Training Data from Production Language Models 2023 Nasr et al.
Detecting Pretraining Data from Large Language Models 2023 Shi et al. [Code]
Membership Inference Attacks against Machine Learning Models 2017 Shokri et al.
Information Leakage in Embedding Models 2020 Song and Raghunathan
Auditing Data Provenance in Text-Generation Models 2019 Song and Shmatikov
Beyond Memorization: Violating Privacy Via Inference with Large Language Models 2023 Staab et al. [Code]
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting 2018 Yeom et al.
Bag of Tricks for Training Data Extraction from Language Models 2023 Yu et al. [Code]
Analyzing Information Leakage of Updates to Natural Language Models 2020 Zanella-BĂ©guelin et al.
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation 2023 Zhang et al. [Code]

Furthermore, see [Google Training Data Extraction Challenge]

Private LLMs

drawing

Image from Google AI Blog

Paper Title Year Author Code
Deep Learning with Differential Privacy 2016 Abadi et al.
Large-Scale Differentially Private BERT 2021 Anil et al.
Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy 2023 Du et al. [Code]
DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass 2023 Du et al. [Code]
An Efficient DP-SGD Mechanism for Large Scale NLP Models 2022 Dupuy et al.
The Algorithmic Foundations of Differential Privacy 2006 Dwork and Roth
Submix: Practical Private Prediction for Large-Scale Language Models 2022 Ginart et al.
Federated Learning for Mobile Keyboard Prediction 2019 Hard et al.
Learning and Evaluating a Differentially Private Pre-trained Language Model 2021 Hoory et al.
Knowledge Sanitization of Large Language Models 2023 Ishibashi and Shimodaira
Differentially Private Language Models Benefit from Public Pre-training 2020 Kerrigan et al. [Code]
Large Language Models Can Be Strong Differentially Private Learners 2022 Li et al.
Differentially Private Decoding in Large Language Models 2022 Majmudar et al.
Communication-Efficient Learning of Deep Networks from Decentralized Data 2016 McMahan et al.
Learning Differentially Private Recurrent Language Models 2018 McMahan et al.
Selective Differential Privacy for Language Modeling 2022 Shi et al. [Code]
Training Production Language Models without Memorizing User Data 2020 Ramaswamy et al.
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation 2023 Tang et al. [Code]
Understanding Unintended Memorization in Federated Learning 2020 Thakkar et al.
Differentially Private Fine-tuning of Language Models 2022 Yu et al. [Code]
Provably Confidential Language Modelling 2022 Zhao et al. [Code]
***Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory 2023 Mireshghallah et al. [Code]
***Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models 2023 Zhou et al.

Unlearning

drawing

Image from Felps 2020

Paper Title Year Author Code
Machine Unlearning 2020 Bourtoule et al. [Code]
Unlearn What You Want to Forget: Efficient Unlearning for LLMs 2023 Chen et al.
Who's Harry Potter? Approximate Unlearning in LLMs 2023 Eldan et al.
Amnesiac Machine Learning 2020 Graves et al.
Adaptive Machine Unlearning 2021 Gupta et al. [Code]
Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy 2024 Hayes et al.
Preserving Privacy Through DeMemorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models 2023 Kassem et al.
Privacy Adhering Machine Un-learning in NLP 2022 Kumar et al.
Knowledge Unlearning for Mitigating Privacy Risks in Language Models 2022 Jang et al. [Code]
Unlearnable Algorithms for In-context Learning 2024 Muresanu et al.
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning 2020 Neel et al.
A Survey of Machine Unlearning 2022 Nguyen et al.
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks 2023 Patil et al. [Code]
In-Context Unlearning: Language Models as Few Shot Unlearners 2023 Pawelczyk et al.
Large Language Model Unlearning 2023 Yao et al. [Code]

Copyright

image info

Custom Image

Paper Title Year Author Code
Can Copyright be Reduced to Privacy? 2023 Elkin-Koren et al.
DeepCreativity: Measuring Creativity with Deep Learning Techniques 2022 Franceschelli et al.
Foundation Models and Fair Use 2023 Henderson et al.
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy 2023 Ippolito et al.
Copyright Violations and Large Language Models 2023 Karamolegkou et al. [Code]
Formalizing Human Ingenuity: A Quantitative Framework for Copyright Law's Substantial Similarity 2022 Scheffler et al.
On Provable Copyright Protection for Generative Models 2023 Vyas et al.

Additional Related Surveys

Paper Title Year Author Code
Membership Inference Attacks on Machine Learning: A Survey 2022 Hu et al.
When Machine Learning Meets Privacy: A Survey and Outlook 2021 Liu et al.
Rethinking Machine Unlearning for Large Language Models 2024 Liu et al.
A Survey of Machine Unlearning 2022 Nguyen et al.
***A Survey of Large Language Models 2023 Zhao et al.

Contact Info

Repository maintained by Peter Chang (pchang@hbs.edu)