/gpt4_bs

Examples of prompts that cause ChatGPT-4 to hallucinate.

Examples of Prompts that cause ChatGPT-4 to hallucinate.

Contributors: Stephen Casper (scasper@mit.edu), Luke Bailey (lukebailey@college.harvard.edu), Zachary Marinov (zmarinov@mit.edu), Michael Gerovich (mgerov@mit.edu), Andrew Garber (andrewgarber@college.harvard.edu), Shuvom Sadhuka (ssadhuka@mit.edu), Oam Patel (opatel@college.harvard.edu), Riley Kong (rileyis@mit.edu)

Dataset: We compiled 104 examples of prompts that cause ChatGPT-4 (May 24 2023 version) to hallucinate untrue content and grouped them into 18 categories. Here we are sharing the examples. Read our post about it LINK COMING SOON.

The categories are

  • Arbitrarily resolving ambiguity in the prompt
  • Being asked to make things up (non-adversarial)
  • BS about fictitious things
  • BS about unremarkable things
  • BS extrapolation from trends
  • BS meanings of theorems
  • BS proofs of true theorems
  • BS uses of unrelated lemmas
  • BS references
  • Common misconceptions
  • Defending BS
  • Deferring to doubt
  • Failing to answer ‘all’
  • Failing to answer ‘none’
  • Imitating untrustworthy people (non-adversarial)
  • Justifying a wrong response
  • Making up outrageous facts
  • Shifts from a common setup

These categories are not and were not intended to be a complete taxonomy. There are certainly other ways to make GPT-4 output falsehoods. In general, any type of question that is difficult to answer correctly would be valid, but we focus instead on certain categories that we find to be particularly egregious.

What we hope this is useful for: Our dataset of examples is small and was collected with a just-messing-around methodology. But some might find that these examples make for decent ones to use for testing various behaviors of chatbots involving hallucination. Our taxonomy could also be useful for more systematically studying hallucination. We also invite OpenAI to fix these issues and for anyone with additional ideas or examples to send them to us so we can update the dataset :)