/bug_in_the_code_stack

A new benchmark for measuring LLM's capability to detect bugs in large codebase.

Primary LanguageJupyter Notebook

No issues in this repository yet.