Issues
- 1
Add more documentation on CLI
#868 opened by mrahtz - 0
Typo on 'welcome' / 'getting started' page.
#867 opened by williamgki - 1
[Feature Request] Add user_message solver
#864 opened by kaifronsdal - 1
- 0
Inspect sometimes hangs on extremely large runs
#849 opened by MSchmatzAISI - 0
- 2
Disabling exponential backoff
#837 opened by AarushSah - 5
Validation error when reading Eval log
#834 opened by us2547 - 3
- 14
[Feature request] Flag to disable all API calls
#776 opened by rusheb-apollo - 0
Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses
#840 opened by max-kaufmann - 1
- 0
Inspect process uses large amounts of memory w/ DockerSandbox, likely because large exec() outputs are loaded into memory
#836 opened by max-kaufmann - 1
[Question] Easy way to use finetuned OpenAI models?
#835 opened by baceolus - 5
Question about running code on eval results
#828 opened by AarushSah - 2
Questions About Evaluating Local Open-source Models with Tool-using Capabilities via vLLM Serving Using inspect_ai
#825 opened by jc-ryan - 1
For certain Model API calls, if they error we don't have the API call logged to the transcript.
#823 opened by max-kaufmann - 0
Log a "Sample error detected" if a sample has an error, so that we can see it in the console when we are running an eval
#824 opened by max-kaufmann - 5
inspect_ai.log._transcript.eval_events_with_content fails to lookup content in EvalSample.transcript
#694 opened by bronson-apollo - 1
Web browser struggles with large pages
#740 opened by art-dsit - 6
[Question] Custom metric type
#775 opened by us2547 - 0
- 1
- 1
Change in behaviour: fractional scoring when using default `includes` scorer and default reducer
#796 opened by craigwalton-dsit - 0
A scorer to compute costs based on model usages (for eval model, grader model, etc.)?
#795 opened by grigory93 - 1
[Feature request] Debug logging for cache inputs
#786 opened by rusheb - 3
- 2
- 2
Approvals for Entropy-labs not found in the registry
#747 opened by mlcocdav - 2
Inspect view Bug when copying ID
#717 opened by jeremy-apollo - 0
- 2
plot_results(): Are there any frameworks that allow summarising and visualising inspect logs?
#704 opened by sohaibimran7 - 3
- 3
Additional completion params for ollama API
#731 opened by ianbulovic - 3
- 6
Many tests are skipped in CI
#726 opened by tadamcz - 7
Modern dependency management
#699 opened by tadamcz - 1
- 1
- 3
- 3
Occasional incorrect grading by choice scorer with chain-of-thought prompting
#721 opened by lennijusten - 7
- 9
[Bug] Tool use broken with Google models
#695 opened by tadamcz - 5
[bug] Wrong Dockerfile used for sandboxes (uses working directory instead of task directory)
#691 opened by tadamcz - 0
Semver Dependency Needs Pin
#705 opened by DamianB-BitFlipper - 4
[Question] Scorer execution order enforcement
#693 opened by us2547 - 2
Make list_eval_logs recursive=True return the log_dir it read the log from
#686 opened by sohaibimran7 - 1
- 0
- 0
Error running `inspect score` with a scoring function that uses a task-level sandbox
#679 opened by bienehito