haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation
Jupyter NotebookMIT
Issues
- 2
[Fixing Test Cases] Label Images - When should 0 be considered a label?
#130 opened by ian-coccimiglio - 23
grouping test cases and categorisation
#112 opened by pr4deepr - 0
New benchmark run from scratch
#128 opened by haesleinhuepf - 2
[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column)
#124 opened by ian-coccimiglio - 0
- 1
- 2
- 1
Maximum and Sum Intensity Projection Tests
#116 opened by ian-coccimiglio - 0
Reference of test case failing with numpy 2.0.2
#114 opened by haesleinhuepf - 6
Remove small labels: Unclear prompt
#111 opened by tischi - 0
- 6
LLM's using this repo as training data
#89 opened by JoOkuma - 0
- 2
test cases suggestions
#99 opened by pr4deepr - 6
- 5
- 1
Are OME-Zarr tests in scope
#98 opened by jluethi - 2
- 0
Benchmark gemini-1.5-pro and gemini ultra
#26 opened by haesleinhuepf - 12
Improve how we ask the LLMs to perform a task
#79 opened by tischi - 3
- 4
- 0
Benchmark llama3.1 405b
#84 opened by haesleinhuepf - 3
- 7
- 2
Report about system prompt in the paper
#75 opened by haesleinhuepf - 1
One model often fails with the same error
#52 opened by haesleinhuepf - 0
measure execution time of tests
#71 opened by haesleinhuepf - 0
count number of comments in generated code
#70 opened by haesleinhuepf - 0
add test-case for cell tracking measuring the speed of a cell and/or number of cells over time
#68 opened by haesleinhuepf - 1
Add CONTRIBUTING guide and code of conduct
#44 opened by haesleinhuepf - 1
rename codellama
#64 opened by haesleinhuepf - 1
Timeout when sampling
#60 opened by haesleinhuepf - 6
Benchmark against bigger open models
#59 opened by dcfidalgo - 1
Samples lost due to error when sampling
#61 opened by haesleinhuepf - 0
- 0
- 0
Use pytorch and/or tensorflow?
#47 opened by haesleinhuepf - 3
add test case for neuroimaging: load nifti file
#24 opened by nscherf - 0
add test for linear intensity profile
#36 opened by haesleinhuepf - 6
- 0
add use case: tiled processing
#23 opened by haesleinhuepf - 2
add test for circle fitting?
#35 opened by tischi - 1
add test for radial intensity profile
#34 opened by tischi - 0
Histogram equalization of an image
#37 opened by haesleinhuepf - 2
add use case: read zarr file
#21 opened by haesleinhuepf - 2
- 1
UMAP unit-tests don't pass due to timeout
#14 opened by haesleinhuepf - 6
How to evaluate
#17 opened by tischi - 0
sample canonical solution
#11 opened by haesleinhuepf