openai/evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

PythonNOASSERTION

Issues

ERROR: Failed building wheel for numpy. clang error compiler does not support 'faltivec
#1566 opened 2 months ago by jinchi2013
1
Request for Global Memory Across Different Chats (Across Sessions)
#1570 opened a month ago by rodrigoreis
0
AttributeError: module 'openai' has no attribute 'error'
#1564 opened 2 months ago by sahilrajput03
2
Vector maping, origins, 2-3-4D definitions.
#1569 opened a month ago by Really-69
1
Project installation fails: `tensorflow` conflicting dependencies
#1567 opened 2 months ago by djbb7
0
Unable to install via `pip install evals`
#1563 opened 2 months ago by sahilrajput03
1
Using different models in evaluating mode-graded eval and in generating the completion
#1393 opened a year ago by LoryPack
6
Setting completion function args via CLI does not work
#1504 opened 9 months ago by LoryPack
1
Is Evals repo being replaced by the Evaluations feature in the Playground?
#1562 opened 3 months ago by sakher
1
Text2code2video eval
#1559 opened 3 months ago by bhack
0
o1 release breaks token usage stats
#1556 opened 3 months ago by lucapericlp
0
Information exposure alert through an exception
#1543 opened 5 months ago by arpitjain099
0
Log injection alert
#1542 opened 5 months ago by arpitjain099
0
Ssomsak
#1538 opened 6 months ago by SsomsakTH
0
Multiple Unit Test Failures Across OpenAI Assistants, Anthropic, and Google Gemini Libraries
#1536 opened 6 months ago by sakher
0
Support for GPT-4o
#1529 opened 7 months ago by PrashantDixit0
3
TensorFlow fails while no TensorFlow expected to run at all
#1532 opened 7 months ago by artkpv
1
Schelling point eval doesn't work
#1533 opened 7 months ago by johny-b
1
Possibility to sell high quality benchmarks
#1437 opened 8 months ago by guliashvili
1
What is this
#1527 opened 8 months ago by DXv-3
1
Getting started example doesn't work - oieval attempts to update a None type object
#1515 opened 9 months ago by jswang
1
Running an evaluation can lead to circular import error
#1424 opened a year ago by chhabrakadabra
4
When installing the project dependencies, i got: "ERROR: Could not build wheels for greenlet, which is required to install pyproject.toml-based projects"
#1513 opened 9 months ago by JuanmaMenendez
3
Support for Azure OpenAI client
#1469 opened 10 months ago by pkt1583
2
`OpenAIChatCompletionFn` should `__init__` should accept `**kwargs`
#1493 opened 9 months ago by ezraporter
0
Support multiple completions for ModelbasedClassify
#1484 opened 9 months ago by tom-christie
0
Eval-running often hangs on last sample
#1384 opened a year ago by sjadler2004
4
Local run doesn't save logs to disk
#1459 opened a year ago by charles-somm
1
Mismatch between LangChainChatModelCompletionFn code and registry
#1434 opened a year ago by LoryPack
3
Tagged Release For 2.0.0
#1456 opened a year ago by michaelAlvarino
1
`Failed to open: ../registry/data/social_iqa/few_shot.jsonl` with custom registry
#1394 opened a year ago by LoryPack
0
Request to change arithmetical_puzzles prompting
#1448 opened a year ago by ArcticBeat05
0
Error structure in `utils` after openai package upgrade
#1432 opened a year ago by inwaves
2
oaieval doesn't run beacuse of "module 'openai' has no attribute 'error'"
#1426 opened a year ago by Antim8
3
oaieval --help errors for me
#1369 opened a year ago by sjadler2004
3
Do not back off on `openai.BadRequestError`
#1408 opened a year ago by johny-b
1
Improvements to `Match`: case insensitive and strip
#1421 opened a year ago by LoryPack
0
Proposal for Adding a New Evaluation Metric: Sentiment Analysis Accuracy
#1419 opened a year ago by Sarfaraz021
0
Evals broken with latest openai package v1.1.1
#1399 opened a year ago by ojaffe
2
Having trouble building Evals locally? Try this.
#1340 opened a year ago by silverfoxf7
1
In the task "balance_chemical_equation", many instances have incorrect labels.
#1386 opened a year ago by dongZheX
1
Multiple evals not found
#1379 opened a year ago by SUMEETRM
5
Should random collection of values be supported?
#1382 opened a year ago by assert6
0
Context window of completion functions not accounted for
#1377 opened a year ago by pskl
0
Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.
#1362 opened a year ago by GiovanniSmokes
0
Evaluate the cost of running tests
#1350 opened a year ago by onjas-buidl
0
How to eval output with ideal_answer directly without having to define the completion_fn ?
#1342 opened a year ago by liuyaox
1
Feature request for evals: Add support for function call.
#1346 opened a year ago by srenault
0
Publish latest evals framework to PyPI
#1344 opened a year ago by robatwilliams
0
Find claims from research paper
#1338 opened a year ago
0