techandy42/bug_in_the_code_stack
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
Jupyter Notebook
No issues in this repository yet.
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
Jupyter Notebook
No issues in this repository yet.