[Feature Request]: TabMWP for Reflexion

Question

[Feature Request]: TabMWP for Reflexion

Closed this issue a month ago · 0 comments

Feature Description

The Reflexion implementation only has the supported prompts.

Add relevant prompts and logic to the current Reflexion implementation. You'll see that an agent's current structure is divided into cog/agent, cog/modules, and cog/functional/. This task will require you to modify cog/prompts but also possibly edit/test your code in all the other relevant modules cog/functional and cog/agent.

Checklist:

Set up your environment via the CONTRIBUTING.md, if needed
Make a Pull Request (PR) as soon as possible
Familiarize yourself with (ensure you develop a thorough code-level understanding):
- Agential's repository structure
- Read the Reflexion paper
- Read through the Reflexion repo code
- The specified method's implementation in Agential
- Read the TabMWP paper
Check the method's paper to see if they test on this benchmark (Y/N)
- Y
- Read through the method's paper/experiments section to gain a conceptual understanding of how this method is tested on this benchmark.
- Develop a deep code-level understanding of how this method is tested on this benchmark in the method paper's repo (i.e. what do the prompts/fewshot examples look like? how are these inputs passed into the LLM? Is there any additional pre/post logic on the LLM call? Is there any additional logic added to this method for this specific benchmark?)
- Add the prompts for the specified benchmark and any additional logic, verifying that the prompt/fewshot examples added are the same as those in the paper's repo and that any additional logic is accounted for in your implementation.
- N
- Read through the experiments section of a methods paper that does test on this benchmark which you can find through the project lifecycle document under the "Methods x Benchmarks" section (or another method paper). This is to gain a conceptual understanding of how methods are normally tested on this benchmark.
- Develop a deep code-level understanding of how this other method is tested on this benchmark (i.e. what do the prompts/fewshot examples look like? how are these inputs passed into the LLM? Is there any additional pre/post logic on the LLM call? Is there any additional logic added to this method for this specific benchmark?)
- Add the prompts for the specified benchmark and any additional logic to this method, verifying that the prompt/fewshot examples added are the same as those in the other method's repo and that any additional logic to adapt this benchmark from this other method to this issue's method is accounted for in your implementation.
Write a short notebook tmp.ipynb in the root directory, showcasing the agent ran on a sample query from the benchmark
- Add print statements for all calls to the LLM for easier debugging + I can easily verify the outputs (check Figure 1 below)
Write a concise, comprehensive summary of your changes in the PR description
- Add reference(s)/links in this description specifying where your prompt/few-shot examples/additional logic are from
Complete the checklist included in the PR description and ensure this GitHub issue's checklist is completed till this point
Request a review from either team members in the GitHub organization who are working on similar issues/files or @alckasoc

Figure 1. Printing out the prompt/output. I include the text "PROMPT AGENT" and "PROMPT AGENT OUT" with the delimiters to easily distinguish which function this prompt/output is from.

Check out the project lifecycle document.

Feel free to ask questions on Slack in the respective PR channel if you're confused! Good luck! 😎

Reason

No response