agential-ai/agential

[Feature Request]: Reflexion

Opened this issue · 0 comments

Feature Description

Implement:

The decision-making benchmarks (ALFWorld, WebShop, and AgentBench) will require more design work. Swapping out the prompts won't suffice.

Run:

  • HotpotQA
  • TriviaQA
  • AmbigNQ
  • GSM8k
  • SVAMP
  • TabMWP
  • MBPP
  • HumanEval
  • ALFWorld
  • WebShop
  • AgentBench (includes ALFWorld & WebShop)