Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Abstract

In this study, we investigated the effects of self-reflection in large language models (LLMs) on problem-solving performance. We instructed nine popular LLMs to answer a series of multiple-choice questions to provide a performance baseline. Then, for each incorrectly answered question, we instructed eight types of self-reflecting LLM agents to reflect on their mistakes and provide themselves with guidance to improve problem-solving. Then, using this guidance, each self-reflecting agent attempted to re-answer the same questions again. Our results indicate that LLM agents are able to significantly improve their problem-solving performance through self-reflection. In addition, we compared the various types of self-reflection to determine their individual contribution to performance.

Documents

Research Paper

Code

Solve with Baseline - answers all questions using the baseline agent
Reflect on Solution - self-reflects on incorrectly answered problems given the correct answer
Save Reflections - separates reflections by type, redacts answers, and saves reflection text
Solve with Reflection - re-answers all incorrectly answered questions using the reflections
Plot Accuracy - plots the accuracy for each agent
Plot Accuracy by Model and Agent - plots the accuracy by model and agent
Plot Accuracy by Exam and Agent - plots the accuracy for each model by exam and agent
Analyze Details - performs the McNemar test and creates a table of the results
Analyze Keywords - analyzes the error keywords produced by the self-reflections

Data

Details - the low-level level details for each question answered in CSV format
Dialogs - the dialog for each question answered in JSON format
Exams - the exams containing MCQA problems in JSONL format
Logs - the log files for each question answered and self-reflection in plain-text format
Plots - the data visualizations of the results in PDF format
Reflections - the text generated during the self-reflections process stored as plain text files
Results - the results from the experiment in CSV format
Tables - the tabular results of the analysis stored as CSV files

matthewrenze/self-reflection

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Abstract

Documents

Code

Data