[Feature Request]: Reflexion
Opened this issue · 0 comments
alckasoc commented
Feature Description
Implement:
The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.
Run:
- HotpotQA
- TriviaQA
- AmbigNQ
- GSM8k
- SVAMP
- TabMWP
- MBPP
- HumanEval
- ALFWorld
- WebShop
- AgentBench (includes ALFWorld & WebShop)