Model Editing Papers

Must-read papers on model editing with large language models.

🔔 News

2023-07 We release EasyEdit, an easy-to-use framework to edit Large Language Models.
2023-06 We will provide a tutorial on Editing Large Language Models at AACL 2023.
2023-05 We release a new analysis paper:"Editing Large Language Models: Problems, Methods, and Opportunities" based on this repository! We are looking forward to any comments or discussions on this topic :)
2022-12 We create this repository to maintain a paper list on Model Editing.

🔍 Contents

🌟 Why Model Editing?
Keywords
📜 Papers
🧰 Resources
- Benchmarks and Tasks
- Tools
🎉 Contribution
🚩Citation

🌟 Why Model Editing?

Model Editing is a compelling field of research that focuses on facilitating efficient modifications to the behavior of models, particularly foundation models. The aim is to implement these changes within a specified scope of interest without negatively affecting the model's performance across a broader range of inputs.

Keywords

Model Editing has strong connections with following topics.

Updating and fixing bugs for large language models
Language models as knowledge base, locating knowledge in large language models
Lifelong learning, unlearning and etc.
Security and privacy for large language models

📜 Papers

This is a collection of research and review papers of Model Editing. Any suggestions and pull requests are welcome for better sharing of latest research progress.

Overview

Editing Large Language Models: Problems, Methods, and Opportunities. [paper]

Methods

Preserve Parameters

Memory-based

Memory-Based Model Editing at Scale (ICML 2022)
Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn. [paper] [code] [demo]
Fixing Model Bugs with Natural Language Patches. (EMNLP 2022)
Shikhar Murty, Christopher D. Manning, Scott M. Lundberg, Marco Túlio Ribeiro. [paper] [code]
MemPrompt: Memory-assisted Prompt Editing with User Feedback. (EMNLP 2022)
Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang. [paper] [code] [page] [video]
Large Language Models with Controllable Working Memory.
Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar. [paper]
Can We Edit Factual Knowledge by In-Context Learning?
Ce Zheng, Lei Li, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang. [paper]
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
Yasumasa Onoe, Michael J.Q. Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi. [paper]
MQUAKE: Assessing Knowledge Editing inLanguage Models via Multi-Hop Questions
Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen.
.[paper]

Additional Parameters

Calibrating Factual Knowledge in Pretrained Language Models. (EMNLP 2022)
Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, Lei Li. [paper] [code]
Transformer-Patcher: One Mistake worth One Neuron. (ICLR 2023)
Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, Zhang Xiong. [paper] [code]
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors.
Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi. [paper] [code]
Neural Knowledge Bank for Pretrained Transformers
Damai Dai, Wenbin Jiang, Qingxiu Dong, Yajuan Lyu, Qiaoqiao She, Zhifang Sui. [paper]

Change LM's representation space

Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez, Belinda Z. Li, Jacob Andreas. [paper] [code]

Modify Parameters

Finetuning

Plug-and-Play Adaptation for Continuously-updated QA. (ACL 2022 Findings)
Kyungjae Lee, Wookje Han, Seung-won Hwang, Hwaran Lee, Joonsuk Park, Sang-Woo Lee. [paper] [code]
Modifying Memories in Transformer Models.
Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar. [paper]

Meta-learning

Editing Factual Knowledge in Language Models.
Nicola De Cao, Wilker Aziz, Ivan Titov. (EMNLP 2021) [paper] [code]
Fast Model Editing at Scale. (ICLR 2022)
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning. [paper] [code] [page]
Editable Neural Networks. (ICLR 2020)
Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry V. Pyrkin, Sergei Popov, Artem Babenko. [paper] [code]

Locate and edit

Editing a classifier by rewriting its prediction rules. (NeurIPS 2021)
Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, Aleksander Madry. [paper] [code]
Language Anisotropic Cross-Lingual Model Editing.
Yang Xu, Yutai Hou, Wanxiang Che. [paper]
Repairing Neural Networks by Leaving the Right Past Behind.
Ryutaro Tanno, Melanie F. Pradier, Aditya Nori, Yingzhen Li. [paper]
Locating and Editing Factual Associations in GPT. (NeurIPS 2022)
Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. [paper] [code] [page] [video]
Mass-Editing Memory in a Transformer.
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau. [paper] [code] [page] [demo]
Editing models with task arithmetic .
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi. [paper]
Editing Commonsense Knowledge in GPT .
Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon. [paper]
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs.
Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer. [paper] [code]
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark .
Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez. [paper]
Knowledge Neurons in Pretrained Transformers.(ACL 2022)
Damai Dai , Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei.[paper] [code] [code by EleutherAI]
LEACE: Perfect linear concept erasure in closed form .
Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman. [paper]
Transformer Feed-Forward Layers Are Key-Value Memories. (EMNLP 2021)
Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy. [paper]
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space.(EMNLP 2022)
Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav Goldberg. [paper]

More Related Papers

FRUIT: Faithfully Reflecting Updated Information in Text. (NAACL 2022)
Robert L. Logan IV, Alexandre Passos, Sameer Singh, Ming-Wei Chang. [paper] [code]
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning. (EMNLP 2022)
Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark. [paper] [code] [video]
Towards Tracing Factual Knowledge in Language Models Back to the Training Data.
Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu. (EMNLP 2022) [paper]
Prompting GPT-3 To Be Reliable.
Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, Lijuan Wang. [paper]
Patching open-vocabulary models by interpolating weights. (NeurIPS 2022)
Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt. [paper] [code]
Decouple knowledge from paramters for plug-and-play language modeling (ACL2023 Findings)
Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan.[paper] [code]
Backpack Language Models
John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang. [paper]

Analysis

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models.
Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun. [paper] [code]
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson. [paper]
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva. [paper]

🧰 Resources

Benchmarks and Tasks

Edit Type	Benchmarks & Datasets
Fact Knowledge	ZSRE, CounterFact, CounterFact+,ECBD MQUAKE
Sentiment	Convsent
Bias	Bias in Bios
Toxic Information	RealToxicityPrompts

Tools

EasyEdit: An Easy-to-use Framework to Edit Large Language Models.

FastEdit: Editing large language models within 10 seconds

Contribution

Contributors

Contributing to this paper list

There are cases where we miss important works in this field, please contribute to this repo! Thanks for the efforts in advance.

Acknowledgement

We would like to express our gratitude to Longhui Yu for the kind reminder about the missing papers.

zjunlp/ModelEditingPapers