/Awesome-LLM-Uncertainty-Reliability-Robustness

Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models

MIT LicenseMIT

Awesome-LLM-Uncertainty-Reliability-Robustness


Awesome License: MIT Made With Love

This repository, called UR2-LLMs contains a collection of resources and papers on Uncertainty, Reliability and Robustness in Large Language Models.

"Large language models have limited reliability, limited understanding, limited range, and hence need human supervision. " - Michael Osborne, Professor of Machine Learning in the Dept. of Engineering Science, University of Oxford, January 25, 2023

Welcome to share your papers, thoughts and ideas in this area!

Contents

Resources

Introductory Posts

GPT Is an Unreliable Information Store
Noble Ackerson
[Link]
20 Feb 2023

“Misusing” Large Language Models and the Future of MT
Arle Lommel
[Link]
20 Dec 2022

Large language models: The basics and their applications
Margo Poda
[Link]
9 Feb 2023

Prompt Engineering: Improving Responses & Reliability
Peter Foy
[Link]
19 Mar 2023

OpenAI's Cookbook on Techniques to Improve Reliability
OpenAI
[Github]
18 Mar 2023

GPT/calibration tag
Gwern Branwen
[Link]

Prompt Engineering
Lilian Weng
[Link]

LLM Powered Autonomous Agents
Lilian Weng
[Link]

Reliability in Learning Prompting
[Link]

Building LLM applications for production
Chip Huyen
[Link]
11 Apr 2023

Technical Reports

GPT-4 Technical Report
OpenAI
arXiv 2023. [Paper][Cookbook]
16 Mar 2023

GPT-4 System Card
OpenAI
arXiv 2023. [Paper] [Github]
15 Mar 2023

Tutorial

Uncertainty Estimation for Natural Language Processing
Adam Fisch, Robin Jia, Tal Schuster
COLLING 2022. [Website]

Papers

Evaluation & Survey

Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang, Bowen Yu, Haiyang Yu, Yangyu Lv, Tingwen Liu, Fei Huang, Hongbo Xu, Yongbin Li
arXiv 2023. [Paper][Github]
3 Aug 2023

A Survey on Evaluation of Large Language Models
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie
Arxiv 2023. [Paper][Github]
6 Jul 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li
Arxiv, 2023. [Paper] [Github] [Website]
20 Jun 2023

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
arXiv, 2023. [Paper]
18 Apr 2023

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, Xia Hu
arXiv 2023. [Paper][Github]
27 Apr 2023

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks
Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, Xuanjing Huang
arXiv 2023. [Paper][Github]
1 Mar 2023

Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
arXiv 2022. [Paper] [Website] [Github] [Blog]
16 Nov 2022

Prompting GPT-3 To Be Reliable
Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, Lijuan Wang
ICLR 2023. [Paper] [Github]
17 Oct 2022

Plex: Towards Reliability using Pretrained Large Model Extensions
Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek, Balaji Lakshminarayanan
arXiv 2022. [Paper]
15 Jul 2022

Language Models (Mostly) Know What They Know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan
arXiv 2022. [Paper]
11 Jul 2022

Augmented Language Models: a Survey
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom
arXiv 2023. [Paper]
15 Feb 2023

A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai, Akash Kumar Mohankumar, Mitesh M. Khapra
ACM Computing Survey, 2022. [Paper]
18 Jan 2022

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole, et al.
ACL 2021. [Paper][Github]
6 Dec 2021

TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Tao Gui et al.
arXiv 2021. [Paper][Github]
21 Mar 2021

Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré
ACL 2021. [Paper] [Github]
13 Jan 2021

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh
ACL 2020. [Paper][Github]
8 May 2020

Uncertainty

Uncertainty Estimation

Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
arXiv 2023. [Paper]
6 Nov 2023

Conformal Autoregressive Generation: Beam Search with Coverage Guarantees
Nicolas Deutschmann, Marvin Alberts, María Rodríguez Martínez
arXiv 2023. [Paper]
7 Sep 2023

Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness
Jiuhai Chen, Jonas Mueller
arXiv 2023. [Paper]
30 Aug 2023

Uncertainty in Natural Language Generation: From Theory to Applications
Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
arXiv 2023. [Paper]
28 July 2023

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Zhen Lin, Shubhendu Trivedi, Jimeng Sun
arXiv 2023. [Paper] [Github]
30 May 2023

Human Uncertainty in Concept-Based AI Systems
Katherine M. Collins, Matthew Barker, Mateo Espinosa Zarlenga, Naveen Raman, Umang Bhatt, Mateja Jamnik, Ilia Sucholutsky, Adrian Weller, Krishnamurthy Dvijotham
arXiv 2023. [Paper]
22 Mar 2023

Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models
Kaitlyn Zhou, Dan Jurafsky, Tatsunori Hashimoto
arXiv 2023. [Paper]
25 Feb 2023

DEUP: Direct Epistemic Uncertainty Prediction
Salem Lahlou, Moksh Jain, Hadi Nekoei, Victor Ion Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio
TMLR 2023. [Paper]
3 Feb 2023

On Compositional Uncertainty Quantification for Seq2seq Graph Parsing
Zi Lin, Du Phan, Panupong Pasupat, Jeremiah Zhe Liu, Jingbo Shang
ICLR 2023. [Paper]
1 Feb 2023

Neural-Symbolic Inference for Robust Autoregressive Graph Parsing via Compositional Uncertainty Quantification
Zi Lin, Jeremiah Liu, Jingbo Shang
EMNLP 2022. [Paper]
16 Jan 2023

Teaching Models to Express Their Uncertainty in Words
Stephanie Lin, Jacob Hilton, Owain Evans
TMLR 2022. [Paper] [Github] [TMLR] [Slide]
28 May 2022

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Lorenz Kuhn, Yarin Gal, Sebastian Farquhar
ICLR 2023. [Paper]
19 Feb 2022

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
Yue Yu, Rongzhi Zhang, Ran Xu, Jieyu Zhang, Jiaming Shen, Chao Zhang
arXiv 2022. [Paper][Github]
15 Sep 2022

Fine-Tuning Language Models via Epistemic Neural Networks
Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving
arXiv 2022. [Paper][Github]
3 Nov 2022

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis
Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency
EMNLP 2022 (Findings). [Paper][Github]
10 Oct 2022

Uncertainty Estimation for Language Reward Models
Adam Gleave, Geoffrey Irving
arXiv 2022. [Paper]
14 Mar 2022

Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression
Yuxia Wang, Daniel Beck, Timothy Baldwin, Karin Verspoor
TACL 2022. [Paper]
Jun 2022

Uncertainty Estimation in Autoregressive Structured Prediction
Andrey Malinin, Mark Gales
ICLR 2021. [Paper]
18 Feb 2020

Unsupervised Quality Estimation for Neural Machine Translation
Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia
TACL 2020. [Paper][Dataset]
21 May 2020

Analyzing Uncertainty in Neural Machine Translation
Myle Ott, Michael Auli, David Grangier, Marc’Aurelio Ranzato
ICML 2018. [Paper]
2018

Calibration

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Han Zhou, Xingchen Wan, Lev Proleev, Diana Mincu, Jilin Chen, Katherine Heller, Subhrajit Roy
ICLR 2024. [Paper] 24 Jan 2024

Do Large Language Models Know What They Don't Know?
Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Xuanjing Huang
arXiv 2023. [Paper] 29 May 2023

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning
arXiv 2023. [Paper]
24 May 2023

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4
Kellin Pelrine, Meilina Reksoprodjo, Caleb Gupta, Joel Christoph, Reihaneh Rabbany
arXiv 2023. [Paper]
24 May 2023

Calibrated Interpretation: Confidence Estimation in Semantic Parsing
Elias Stengel-Eskin, Benjamin Van Durme
arXiv 2022. [Paper] [Github]
14 Nov 2022.

Calibrating Sequence likelihood Improves Conditional Language Generation
Yao Zhao, Misha Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J. Liu
ICLR 2023. [Paper]
30 Sep 2022

Calibrated Selective Classification
Adam Fisch, Tommi Jaakkola, Regina Barzilay
TMLR 2022. [Paper]
25 Aug 2022

Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau
NAACL 2022. [Paper]
22 Jun 2022

Re-Examining Calibration: The Case of Question Answering
Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
EMNLP 2022 Findings. [Paper]
25 May 2022

Towards Collaborative Neural-Symbolic Graph Semantic Parsing via Uncertainty
Zi Lin, Jeremiah Liu, Jingbo Shang
ACL Fingings 2022. [Paper]
22 May 2022

Uncertainty-aware machine translation evaluation
Taisiya Glushkova, Chrysoula Zerva, Ricardo Rei, André F. T. Martins
EMNLP 2021. [Paper]
13 Sep 2021

Calibrate Before Use: Improving Few-Shot Performance of Language Models
Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh
ICML 2021. [Paper][Github
19 Feb 2021

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
Zhengbao Jiang, Jun Araki, Haibo Ding, Graham Neubig
TACL 2021. [Paper][Github]
2 Dec 2020

Calibration of Pre-trained Transformers
Shrey Desai, Greg Durrett
EMNLP 2020. [Paper][Github]
17 May 2020

Ambiguity

Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, Jaewoo Kang
EMNLP 2023. [Paper][Github]
23 Oct 2023

Selectively Answering Ambiguous Questions
Jeremy R. Cole, Michael J.Q. Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein \ arXiv 2023. [Paper]
24 May 2023

We're Afraid Language Models Aren't Modeling Ambiguity \ Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
arXiv 2023. [Paper][Github]
24 Apr 2023

Task Ambiguity in Humans and Language Models
Alex Tamkin, Kunal Handa, Avash Shrestha, Noah Goodman
ICLR 2023. [Paper][Github]
20 Dec 2022

CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models
Lorenz Kuhn, Yarin Gal, Sebastian Farquhar
arXiv 2022. [Paper]
15 Dec 2022

How to Approach Ambiguous Queries in Conversational Search: A Survey of Techniques, Approaches, Tools, and Challenges
Kimiya Keyvan, Jimmy Xiangji Huang
ACM Computing Survey, 2022. [Paper]
7 Dec 2022

Assistance with large language models
Dmitrii Krasheninnikov, Egor Krasheninnikov, David Krueger
NeurIPS MLSW Workshop 2022. [Paper]
5 Dec 2022

Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA
Elias Stengel-Eskin, Jimena Guallar-Blasco, Yi Zhou, Benjamin Van Durme
arXiv 2022. [Paper][Github]
14 Nov 2022

Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Meiqi Guo, Mingda Zhang, Siva Reddy, Malihe Alikhani
AKBC 2021. [Paper]
22 Jun 2021

Confidence

The Confidence-Competence Gap in Large Language Models: A Cognitive Study
Aniket Kumar Singh, Suman Devkota, Bishal Lamichhane, Uttam Dhakal, Chandra Dhakal
arXiv 2023. [Paper]
28 Sep 2023

Strength in Numbers: Estimating Confidence of Large Language Models by Prompt Agreement
Gwenyth Portillo Wightman, Alexandra Delucia, Mark Dredze
ACL TrustNLP Workshop 2023. [Paper]
1 Jul 2023

What Are the Different Approaches for Detecting Content Generated by LLMs Such As ChatGPT? And How Do They Work and Differ?
Sebastian Raschka
[Link] [GPTZero]
1 Feb 2023

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn
arXiv 2023. [Paper][Website]
26 Jan 2023

Confident Adaptive Language Modeling
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler
NeurIPS 2022. [Paper] 25 Oct 2022

Conformal risk control
Anastasios N Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, Tal Schuster
arXiv 2022. [Paper][Github]
4 Aug 2022

Active Learning

A Survey of Active Learning for Natural Language Processing
Zhisong Zhang, Emma Strubell, Eduard Hovy
EMNLP 2022. [Paper][Github]
18 Oct 2022

Active Prompting with Chain-of-Thought for Large Language Models
Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang
arXiv 2023. [Paper][Github]
23 Feb 2023

Low-resource Interactive Active Labeling for Fine-tuning Language Models
Seiji Maekawa, Dan Zhang, Hannah Kim, Sajjadur Rahman, Estevam Hruschka
EMNLP Findings 2022. [Paper]
7 Dec 2022

Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, R. Michael Alvarez, Anima Anandkumar
NeurIPS Workshop 2022. [Paper]
21 Nov 2022

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Oreen Yousuf, Salomey Osei, Abigail Oppong, Iyanuoluwa Shode, Oluwabusayo Olufunke Awoyomi, Chris Chinenye Emezue
EMNLP 2022. [Paper][Github]
7 Nov 2022

Active Learning Helps Pretrained Models Learn the Intended Task
Alex Tamkin, Dat Pham Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman
NeurIPS 2022. [Paper][Github]
31 Oct 2022

Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023. [Paper][Github]
5 Sep 2022

Multi-task Active Learning for Pre-trained Transformer-based Models
Guy Rotman, Roi Reichart
TACL 2022. [Paper] [Github]
10 Aug 2022

AcTune: Uncertainty-Based Active Self-Training for Active Fine-Tuning of Pretrained Language Models
Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, Chao Zhang
NAACL-HLT2022. [Paper] [Github]
10 Jul 2022

Towards Computationally Feasible Deep Active Learning
Akim Tsvigun, Artem Shelmanov, Gleb Kuzmin, Leonid Sanochkin, Daniil Larionov, Gleb Gusev, Manvel Avetisian, Leonid Zhukov
NAACL 2022. [Paper] [Github]
7 May 2022

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction
Minh Van Nguyen, Nghia Trung Ngo, Bonan Min, Thien Huu Nguyen
NAACL 2022. [Paper] [Github]
16 Feb 2022

On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
Katerina Margatina, Loïc Barrault, Nikolaos Aletras
ACL 2022. [Paper]
2 Mar 2022

Limitations of Active Learning With Deep Transformer Language Models
Mike D'Arcy, Doug Downey
Arxiv 2022. [Paper]
28 Jan 2022

Active Learning by Acquiring Contrastive Examples
Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras
EMNLP 2021. [Paper][Github]
8 Sep 2021

Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers
Christopher Schröder, Andreas Niekler, Martin Potthast
ACL 2022 Findings. [Paper][Github]
12 Jul 2021

Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates
Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko
EACL 2021. [Paper]
18 Feb 2021

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu
COLING 2020. [Paper]
4 Dec 2020

Reliability

Hallucination

awesome hallucination detection

SAC$^3$: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan Kumar
EMNLP 2023. [Paper][Github]
3 Nov 2023

Hallucination Leaderboard
Vectara
[Link]
2 Nov 2023

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators
Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng Chua, Kam-Fai Wong
EMNLP 2023. [Paper][Github]
12 Oct 2023

Chain-of-Verification Reduces Hallucination in Large Language Models
Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston
arXiv 2023. [Paper]
20 Sep 2023

Do Language Models Know When They're Hallucinating References?
Ayush Agrawal, Lester Mackey, Adam Tauman Kalai
arXiv 2023. [Paper]
29 May 2023.

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev
arXiv 2023. [Paper]
25 May 2023

Why Does ChatGPT Fall Short in Providing Truthful Answers?
Shen Zheng, Jie Huang, Kevin Chen-Chuan Chang
arXiv 2023. [Paper]
24 May 2023

How Language Model Hallucinations Can Snowball
Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
arXiv 2023. [Paper]
22 May 2023

LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen, May Hamri, Mor Geva, Amir Globerson
arXiv 2023. [Paper]
22 May 2023

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
arXiv 2023. [Paper] 19 May 2023

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul, Adian Liusie, Mark J. F. Gales
arXiv 2023. [Paper] [Github]
8 Mar 2023

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao
arXiv 2023. [Paper]
23 Feb 2023

RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding
Ziwei Ji, Zihan Liu, Nayeon Lee, Tiezheng Yu, Bryan Wilie, Min Zeng, Pascale Fung
arXiv 2022. [Paper]
3 Dec 2022

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, Edoardo M. Ponti, Siva Reddy
TACL 2022. [Paper]
22 Apr 2022

Survey of Hallucination in Natural Language Generation
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Wenliang Dai, Andrea Madotto, Pascale Fung
arXiv 2022. [Paper]
8 Feb 2022

Truthfulness

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model \ Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
arXiv 2023. [Paper] [Github]
6 June 2023

The Internal State of an LLM Knows When its Lying
Amos Azaria, Tom Mitchell
arXiv 2023. [Paper]
26 Apr 2023

TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie Lin, Jacob Hilton, Owain Evans
ACL 2022. [Paper] [Github] [Blog]
8 Sep 2021

Truthful AI: Developing and governing AI that does not lie
Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, William Saunders
arXiv 2021. [Paper] [Blog]
13 Oct 2021

Measuring Reliability of Large Language Models through Semantic Consistency
Harsh Raj, Domenic Rosati, Subhabrata Majumdar
NeurIPS 2022 ML Safety Workshop. [Paper]
10 Nov 2022

Reasoning

REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings
arXiv 2023. [Paper]
4 Apr 2023

OpenICL: An Open-Source Framework for In-context Learning
Zhenyu Wu, YaoXiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu
arXiv 2023. [Paper] [Github]
6 Mar 2023

Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
Abhiramon Rajasekharan, Yankai Zeng, Parth Padalkar, Gopal Gupta
arXiv 2023. [Paper]
7 Feb 2023

Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
ICLR 2023. [Paper]
21 Mar 2022

Chain of Thought Prompting Elicits Reasoning in Large Language Models.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou
arXiv 2022. [Paper]
28 Jan 2022

STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning.
Eric Zelikman, Yuhuai Wu, Noah D. Goodman
NeurIPS 2022. [Paper][Github]
28 Mar 2022

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
Xi Ye, Greg Durrett
NeurIPS 2022. [Paper] [Github]
6 May 2022

Rationale-Augmented Ensembles in Language Models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou
arXiv 2022. [Paper]
2 Jul 2022

ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
ICLR 2023. [Paper][Github] [Project]
6 Oct 2022

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, Diyi Yang
arXiv 2022. [Paper]
15 Dec 2022

On the Advance of Making Language Models Better Reasoners
Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen
arXiv 2022. [Paper][Github]
6 Jun 2022

Ask Me Anything: A simple strategy for prompting language models
Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré
arXiv 2022. [Paper][Github]
5 Oct 2022

MathPrompter: Mathematical Reasoning using Large Language Models
Shima Imani, Liang Du, Harsh Shrivastava
arXiv 2023. [Paper]
4 Mar 2023

Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot
arXiv 2022. [Paper][Github]
3 Oct 2022

Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis
arXiv 2022. [Paper][Github] 7 Oct 2022

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal
arXiv 2023. [Paper][Github]
20 Dec 2022

Prompt tuning, optimization and design

Large Language Models as Optimizers
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen
arXiv 2023. [Paper]
Sep 7 2023

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou
arXiv 2023. [Paper] [Github]
5 Jun 2023

Promptboosting: Black-box text classification with ten forward passes
Bairu Hou, Joe O’Connor, Jacob Andreas, Shiyu Chang, Yang Zhang
ICML 2023. [Paper][Github]
23 Jan 2023

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal
EACL 2023. [Paper][Github]
Mar 14 2022

RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning
Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu
EMNLP 2022. [Paper][Github]
25 May 2022

Black-box Prompt Learning for Pre-trained Language Models
Shizhe Diao, Zhichao Huang, Ruijia Xu, Xuechun Li, Yong Lin, Xiao Zhou, Tong Zhang
TMLR 2023. [Paper][Github]
22 Jan 2022

Black-Box Tuning for Language-Model-as-a-Service
Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu
ICML 2022. [Paper][Github]
10 Jan 2022

BBTv2: towards a gradient-free future with large language models
Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu \ EMNLP 2022. [Paper] [Github]
7 Dec 2022

Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola
ICLR 2023. [Paper][Github]
7 Oct 2022

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
KaShun Shum, Shizhe Diao, Tong Zhang
arXiv 2023. [Paper][Github]
24 Feb 2023

Large Language Models Are Human-Level Prompt Engineers
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba
ICLR 2023. [Paper] [Github]
3 Nov 2022

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus Stenetorp
ACL 2022. [Paper]

Active Example Selection for In-Context Learning
Yiming Zhang, Shi Feng, Chenhao Tan
EMNLP 2022. [Paper][Github]
8 Nov 2022

Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023. [Paper][Github]
5 Sep 2022

Learning To Retrieve Prompts for In-Context Learning
Ohad Rubin, Jonathan Herzig, Jonathan Berant
NAACL-HLT 2022. [Paper][Github]
16 Dec 2021

Instruction and RLHF

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji
arXiv 2023. [Paper][Github]
27 Apr 2023

Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Sean Welleck, Bodhisattwa Prasad Majumder, Shashank Gupta, Amir Yazdanbakhsh, Peter Clark
arXiv 2023. [Paper][Github] [Website]
30 Mar 2023

Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning
Renze Lou, Kai Zhang, Wenpeng Yin
arXiv 2023. [Paper][Github]
18 Mar 2023

Self-Instruct: Aligning Language Model with Self Generated Instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
arXiv 2022. [Paper] [Github]
20 Dec 2022

Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, et al (Anthropic)
arXiv 2022. [Paper]
15 Dec 2022

Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez et al.
arXiv 2022. [Paper]
19 Dec 2022

In-Context Instruction Learning
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo
arXiv 2023. [Paper][Github]
28 Feb 2023

Tools and external APIs

Internet-augmented language models through few-shot prompting for open-domain question answering
Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, Nikolai Grigorev
arXiv 2023. [Paper]
10 Mar 2023

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen
arXiv 2022. [Paper][Github]
22 Nov 2022

PAL: Program-aided Language Models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
arXiv 2022. [Paper] [Github] [Project]
18 Nov 2022

TALM: Tool Augmented Language Models
Aaron Parisi, Yao Zhao, Noah Fiedel
arXiv 2022. [Paper]
24 May 2022

Toolformer: Language Models Can Teach Themselves to Use Tool
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
arXiv 2023. [Paper]
9 Feb 2023

Fine-tuning

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
arXiv 2023. [Paper]
3 May 2023

FreeLM: Fine-Tuning-Free Language Model
Xiang Li1, Xin Jiang, Xuying Meng, Aixin Sun, Yequan Wang
arXiv 2023. [Paper]
2 May 2023

Robustness

Invariance

Invariant Language Modeling \ Maxime Peyrard, Sarvjeet Singh Ghotra, Martin Josifoski, Vidhan Agarwal, Barun Patra, Dean Carignan, Emre Kiciman, Robert West
EMNLP 2022. [Paper][Github]
16 Oct 2021

Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
Liang Chen, Hongru Wang, Yang Deng, Wai-Chung Kwan, Kam-Fai Wong
Findings of ACL 2023. [Paper][Github]
22 May 2023

Distribution Shift

Exploring Distributional Shifts in Large Language Models for Code Analysis
Shushan Arakelyan, Rocktim Jyoti Das, Yi Mao, Xiang Ren
arXiv 2023. [Paper]
16 Mar 2023

Out-of-Distribution

Out-of-Distribution Detection and Selective Generation for Conditional Language Models
Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu
ICLR 2023. [Paper]
30 Sep 2022

Adaptation and Generalization

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
Xu Guo, Han Yu
arXiv 2022. [Paper]
6 Nov 2022

Adversarial

Adversarial Attacks on LLMs
Lilian Weng [Blog]
25 Oct 2023

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, Xing Xie
arXiv 2023. [Paper][Github]
7 Jun 20223

On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie
arXiv 2023. [Paper] [Github]
22 Feb 2023

Reliability Testing for Natural Language Processing Systems
Samson Tan, Shafiq Joty, Kathy Baxter, Araz Taeihagh, Gregory A. Bennett, Min-Yen Kan
ACL-IJCNLP 2021. [Paper]
06 May 2021

Attribution

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster
arXiv 2022. [Paper]
15 Dec 2022

Causality

Can Large Language Models Infer Causation from Correlation?
Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf
arXiv 2023. [Paper] [Github]
9 Jun 2023

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
Antonia Creswell, Murray Shanahan, Irina Higgins
ICLR 2023. [Paper]
19 May 2022

Investigating causal understanding in LLMs
Marius Hobbhahn, Tom Lieberum, David Seiler
NeurIPS 2022 Workshop. [Paper][Blog]
3 Oct 2022