/bug_in_the_code_stack_v2

Can LLMs find bugs that compilers can't?: A benchmark for measuring LLMs' capabilities in debugging large source code.

Primary LanguageJupyter Notebook