noahshinn/reflexion

About the prompt for reflection

Statisticss opened this issue · 4 comments

Thanks for your great work!

I have a question regarding the code. In prompts.py, the prompt for reflection (https://github.com/noahshinn024/reflexion/blob/612e616603650397d4060117de4578658626deb1/hotpotqa_runs/prompts.py#L117C1-L124C15) is :
REFLECT_INSTRUCTION = """You are an advanced reasoning agent that can improve based on self refection. You will be given a previous reasoning trial in which you were given access to an Docstore API environment and a question to answer. You were unsuccessful in answering the question either because you guessed the wrong answer with Finish[], or you used up your set number of reasoning steps. In a few sentences, Diagnose a possible reason for failure and devise a new, concise, high level plan that aims to mitigate the same failure. Use complete sentences.
Here are some examples:
{examples}
Previous trial:
Question: {question}{scratchpad}
Reflection:"""

There are a few things in the prompt that confuse me:
(1) It seems that this prompt is designed for unsuccessful trails (in the prompt, there is "You were unsuccessful in answering the question either because ..."). What about successful trails? Does this system reflect on successful trails and summarize the successful experiences?
(2) For the {examples} part, I think it was from the REFLECTIONS in fewshots.py (https://github.com/noahshinn024/reflexion/blob/612e616603650397d4060117de4578658626deb1/hotpotqa_runs/fewshots.py#L68C1-L106C4). However, in the REFLECTIONS, it only shows Question, Actions, Thoughts, and the corresponding Reflection, where it doesn't show the results of these actions. It means that the LLM doesn't know whether the actions are successful or not. Then how can the LLM reflect on it?
(3) In the prompt, there is nothing after "Previous trial:", and both {question} and {scratchpad} follow "Question:" (https://github.com/noahshinn024/reflexion/blob/612e616603650397d4060117de4578658626deb1/hotpotqa_runs/prompts.py#L121C1-L122C33). Is this format correct?

Hi @Statisticss , can you tag the specific lines that you are referencing? I'm happy to answer these questions and I want to make sure that we are discussing the same information.

Sure! Please see the updated questions

Thanks!

With regard to your questions -- reflection prompts are written to identify mistakes. It is the reward model/evaluator's job to determine if a reflection is needed. If it is successful, then there is no need to correct any mistakes. The general pipeline for Reflexion is:
trajectory --> reward in {0, 1} --> reflection --> new trajectory.
In our paper, we mentioned that we used the reward from the HotPotQA environment. For (3), the "trial" is the question + answer. Is this what you are asking about?

That perfectly answers my questions. Thank you!