UKGovernmentBEIS/inspect_ai

Inspect: A framework for large language model evaluations

PythonMIT

Issues

Add more documentation on CLI
#868 opened 7 days ago by mrahtz
1
Typo on 'welcome' / 'getting started' page.
#867 opened 7 days ago by williamgki
0
[Feature Request] Add user_message solver
#864 opened 7 days ago by kaifronsdal
1
Is there a way to have max-connections go beyond 100?
#847 opened 11 days ago by AarushSah
1
Inspect sometimes hangs on extremely large runs
#849 opened 12 days ago by MSchmatzAISI
0
`normalize_number` fails on exponent / fractional input
#848 opened 12 days ago by evanmiller-anthropic
0
Disabling exponential backoff
#837 opened 12 days ago by AarushSah
2
Validation error when reading Eval log
#834 opened 15 days ago by us2547
5
Dataclass Objects in Score Metadata Not Preserved in Binary Log Format
#842 opened 12 days ago by rusheb
3
[Feature request] Flag to disable all API calls
#776 opened 21 days ago by rusheb-apollo
14
Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses
#840 opened 13 days ago by max-kaufmann
0
`match` scorer doesn't handle answers with percent sign
#838 opened 14 days ago by fastfedora
1
Inspect process uses large amounts of memory w/ DockerSandbox, likely because large exec() outputs are loaded into memory
#836 opened 15 days ago by max-kaufmann
0
[Question] Easy way to use finetuned OpenAI models?
#835 opened 15 days ago by baceolus
1
Question about running code on eval results
#828 opened 17 days ago by AarushSah
5
Questions About Evaluating Local Open-source Models with Tool-using Capabilities via vLLM Serving Using inspect_ai
#825 opened 16 days ago by jc-ryan
2
For certain Model API calls, if they error we don't have the API call logged to the transcript.
#823 opened 18 days ago by max-kaufmann
1
Log a "Sample error detected" if a sample has an error, so that we can see it in the console when we are running an eval
#824 opened 18 days ago by max-kaufmann
0
inspect_ai.log._transcript.eval_events_with_content fails to lookup content in EvalSample.transcript
#694 opened a month ago by bronson-apollo
5
Web browser struggles with large pages
#740 opened a month ago by art-dsit
1
[Question] Custom metric type
#775 opened a month ago by us2547
6
[bug] `eval_set` raises PrerequisiteError when rerun after KeyboardInterrupt
#805 opened 22 days ago by rusheb
0
[bug] task args are not preserved after KeyboardInterrupt of `eval_set`
#804 opened 22 days ago by rusheb-apollo
1
Change in behaviour: fractional scoring when using default `includes` scorer and default reducer
#796 opened 23 days ago by craigwalton-dsit
1
A scorer to compute costs based on model usages (for eval model, grader model, etc.)?
#795 opened 23 days ago by grigory93
0
[Feature request] Debug logging for cache inputs
#786 opened a month ago by rusheb
1
Add ability to use self-signed SSL certs with web browser tooling
#772 opened a month ago by MSchmatzAISI
3
Adding a scorer without rerunning existing scorers.
#770 opened a month ago by sohaibimran7
2
Approvals for Entropy-labs not found in the registry
#747 opened a month ago by mlcocdav
2
Inspect view Bug when copying ID
#717 opened a month ago by jeremy-apollo
2
Failed epoch doesn't stop rest of task's epochs
#746 opened a month ago by craigwalton-dsit
0
plot_results(): Are there any frameworks that allow summarising and visualising inspect logs?
#704 opened a month ago by sohaibimran7
2
Model-based scorers are not able to access full chat histories
#685 opened a month ago by ProtD
3
Additional completion params for ollama API
#731 opened a month ago by ianbulovic
3
[Feature Request] Add standardised way to index samples within task
#702 opened a month ago by skinnerjc
3
Many tests are skipped in CI
#726 opened a month ago by tadamcz
6
Modern dependency management
#699 opened a month ago by tadamcz
7
Support Dockerfile (not just compose file) in call to `sandbox()`
#727 opened a month ago by tadamcz
1
Calling eval() does not load model args from env file
#729 opened a month ago by ianbulovic
1
Allow specifying parallel_tool_calls=False in ToolDef
#720 opened a month ago by bronson-apollo
3
Occasional incorrect grading by choice scorer with chain-of-thought prompting
#721 opened a month ago by lennijusten
3
Improve structure and typing of sandbox environment objects
#692 opened a month ago by tadamcz
7
[Bug] Tool use broken with Google models
#695 opened a month ago by tadamcz
9
[bug] Wrong Dockerfile used for sandboxes (uses working directory instead of task directory)
#691 opened a month ago by tadamcz
5
Semver Dependency Needs Pin
#705 opened a month ago by DamianB-BitFlipper
0
[Question] Scorer execution order enforcement
#693 opened a month ago by us2547
4
Make list_eval_logs recursive=True return the log_dir it read the log from
#686 opened 2 months ago by sohaibimran7
2
`ValueError No model specified` for tutorial example when run in notebook
#683 opened 2 months ago by Lovkush-A
1
support for web browser navigations that open new windows
#681 opened 2 months ago by jjallaire-aisi
0
Error running `inspect score` with a scoring function that uses a task-level sandbox
#679 opened 2 months ago by bienehito
0