agential-ai/agential

🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!

PythonMIT

Issues

[Feature Request]: LLM-as-a-Judge Evaluation?
#265 opened 14 days ago
0
[Feature Request]: Evaluation Metrics
#264 opened a month ago
0
[Feature Request]: Evaluation Harness
#262 opened a month ago
0
[Feature Request]: Stabilize Output Parsing for Code benchmarks (specifically HEval)
#257 opened 2 months ago
0
[Feature Request]: Standardized Output Fix for ExpeL, LATS, More Coverage
#255 opened 2 months ago
0
[Feature Request]: Standardize Base Agent
#254 opened a month ago
0
[Feature Request]: Isolate Agents in their own Modules
#253 opened a month ago
0
[Feature Request]: Migrate Prompt creation and LLM Calling Away from LangChain
#251 opened 2 months ago
0
[Feature Request]: Add simple baselines
#247 opened a month ago
0
[Feature Request]: Log token usage/costs/time for all agents
#246 opened 2 months ago
0
[Feature Request]: Implement LATS
#245 opened 2 months ago
0
[Feature Request]: ExpeL Structured Outputs
#243 opened 3 months ago
0
[Feature Request]: ExpeL
#242 opened 3 months ago
0
[Feature Request]: MBPP for ExpeL
#241 opened 3 months ago
0
[Feature Request]: HumanEval for ExpeL
#240 opened 3 months ago
0
[Feature Request]: TabMWP for ExpeL
#239 opened 3 months ago
0
[Feature Request]: SVAMP for ExpeL
#238 opened 3 months ago
0
[Feature Request]: GSM8K for ExpeL
#237 opened 3 months ago
0
[Feature Request]: FEVER for ExpeL
#236 opened 3 months ago
0
[Feature Request]: TriviaQA for ExpeL
#235 opened 3 months ago
0
[Feature Request]: AmbigNQ for ExpeL
#234 opened 3 months ago
0
[Feature Request]: HotpotQA for ExpeL
#233 opened 3 months ago
0
[Feature Request]: Refactor ExpeL
#232 opened 3 months ago
0
[Feature Request]: Standardize error types in Reflexion with CRITIC/SR
#230 opened 3 months ago
0
[Feature Request]: Standardize CRITIC/Self-Refine Few-shots for Math
#229 opened 3 months ago
0
[Feature Request]: Refactor Self-Refine
#226 opened 3 months ago
0