/self-reflection

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Abstract

In this study, we investigated the effects of self-reflection in large language models (LLMs) on problem-solving performance. We instructed nine popular LLMs to answer a series of multiple-choice questions to provide a performance baseline. Then, for each incorrectly answered question, we instructed eight types of self-reflecting LLM agents to reflect on their mistakes and provide themselves with guidance to improve problem-solving. Then, using this guidance, each self-reflecting agent attempted to re-answer the same questions again. Our results indicate that LLM agents are able to significantly improve their problem-solving performance through self-reflection. In addition, we compared the various types of self-reflection to determine their individual contribution to performance.

Documents

Code

  1. Solve with Baseline - answers all questions using the baseline agent
  2. Reflect on Solution - self-reflects on incorrectly answered problems given the correct answer
  3. Save Reflections - separates reflections by type, redacts answers, and saves reflection text
  4. Solve with Reflection - re-answers all incorrectly answered questions using the reflections
  5. Plot Accuracy - plots the accuracy for each agent
  6. Plot Accuracy by Model and Agent - plots the accuracy by model and agent
  7. Plot Accuracy by Exam and Agent - plots the accuracy for each model by exam and agent
  8. Analyze Details - performs the McNemar test and creates a table of the results
  9. Analyze Keywords - analyzes the error keywords produced by the self-reflections

Data

  • Details - the low-level level details for each question answered in CSV format
  • Dialogs - the dialog for each question answered in JSON format
  • Exams - the exams containing MCQA problems in JSONL format
  • Logs - the log files for each question answered and self-reflection in plain-text format
  • Plots - the data visualizations of the results in PDF format
  • Reflections - the text generated during the self-reflections process stored as plain text files
  • Results - the results from the experiment in CSV format
  • Tables - the tabular results of the analysis stored as CSV files