haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

Jupyter NotebookMIT

Issues

[Fixing Test Cases] Label Images - When should 0 be considered a label?
#130 opened 16 days ago by ian-coccimiglio
2
grouping test cases and categorisation
#112 opened 21 days ago by pr4deepr
23
New benchmark run from scratch
#128 opened 17 days ago by haesleinhuepf
0
[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column)
#124 opened 17 days ago by ian-coccimiglio
2
[Fixing Test-Cases] Functions with image inputs but checking lists
#120 opened 17 days ago by ian-coccimiglio
0
Guarding against LLMs that would learn our repo "by heart"
#119 opened 18 days ago by tischi
1
Should we represent images using default numpy.asarray()?
#115 opened 19 days ago by ian-coccimiglio
2
Maximum and Sum Intensity Projection Tests
#116 opened 19 days ago by ian-coccimiglio
1
Reference of test case failing with numpy 2.0.2
#114 opened 20 days ago by haesleinhuepf
0
Remove small labels: Unclear prompt
#111 opened 22 days ago by tischi
6
Potential test case: Save voxel size in meta data
#109 opened 22 days ago by haesleinhuepf
0
LLM's using this repo as training data
#89 opened 2 months ago by JoOkuma
6
model names with : cause issues in create_samples.ipynb
#102 opened 25 days ago by haesleinhuepf
0
test cases suggestions
#99 opened a month ago by pr4deepr
2
Test cases from the Icy / Fiji or Java ecosystems?
#74 opened 3 months ago by tinevez
6
What about future models learning from our resource?
#53 opened 5 months ago by tischi
5
Are OME-Zarr tests in scope
#98 opened a month ago by jluethi
1
Curiosity: If test fails, add to prompt the error description?
#97 opened a month ago by ClementCaporal
2
Benchmark gemini-1.5-pro and gemini ultra
#26 opened a month ago by haesleinhuepf
0
Improve how we ask the LLMs to perform a task
#79 opened 3 months ago by tischi
12
Error: name 'Python' is not defined - in llama models
#80 opened a month ago by ian-coccimiglio
3
Check why rather basic test-cases are failing
#76 opened 3 months ago by haesleinhuepf
4
Benchmark llama3.1 405b
#84 opened 2 months ago by haesleinhuepf
0
Add test case: Save image according to ome-ngff standards
#73 opened 3 months ago by ClementCaporal
3
Evaluating LLMs capabilities relative to task-complexity
#77 opened 3 months ago by ian-coccimiglio
7
Report about system prompt in the paper
#75 opened 3 months ago by haesleinhuepf
2
One model often fails with the same error
#52 opened 3 months ago by haesleinhuepf
1
measure execution time of tests
#71 opened 3 months ago by haesleinhuepf
0
count number of comments in generated code
#70 opened 3 months ago by haesleinhuepf
0
add test-case for cell tracking measuring the speed of a cell and/or number of cells over time
#68 opened 3 months ago by haesleinhuepf
0
Add CONTRIBUTING guide and code of conduct
#44 opened 5 months ago by haesleinhuepf
1
rename codellama
#64 opened 5 months ago by haesleinhuepf
1
Timeout when sampling
#60 opened 5 months ago by haesleinhuepf
1
Benchmark against bigger open models
#59 opened 5 months ago by dcfidalgo
6
Samples lost due to error when sampling
#61 opened 5 months ago by haesleinhuepf
1
Mistral benchmarking on blablador currently fails
#55 opened 5 months ago by haesleinhuepf
0
add use case: use aicsimageio to load a file
#22 opened 5 months ago by haesleinhuepf
0
Use pytorch and/or tensorflow?
#47 opened 5 months ago by haesleinhuepf
0
add test case for neuroimaging: load nifti file
#24 opened 5 months ago by nscherf
3
add test for linear intensity profile
#36 opened 5 months ago by haesleinhuepf
0
How to deal with tests that fail due to missing dependencies
#39 opened 5 months ago by tischi
6
add use case: tiled processing
#23 opened 5 months ago by haesleinhuepf
0
add test for circle fitting?
#35 opened 5 months ago by tischi
2
add test for radial intensity profile
#34 opened 5 months ago by tischi
1
Histogram equalization of an image
#37 opened 5 months ago by haesleinhuepf
0
add use case: read zarr file
#21 opened 5 months ago by haesleinhuepf
2
Simple tests involving the file system don't pass
#12 opened 5 months ago by haesleinhuepf
2
UMAP unit-tests don't pass due to timeout
#14 opened 5 months ago by haesleinhuepf
1
How to evaluate
#17 opened 6 months ago by tischi
6
sample canonical solution
#11 opened 6 months ago by haesleinhuepf
0