swe-bench

There are 4 repositories under swe-bench topic.

  • refact

    AI Agent that handles engineering tasks end-to-end: integrates with developers’ tools, plans, executes, and iterates until it achieves a successful result.

    Language:Rust3.3k
  • SE-Agent

    SE-Agent is a self-evolution framework that enables information exchange between reasoning paths through a trajectory-level evolution mechanism, breaking the cognitive limitations of single trajectories.

    Language:Python156
  • insights

    We track and analyze the activity and performance of autonomous code agents in the wild

    Language:TypeScript40
  • swe-bench-evaluation

    This project explores how Large Language Models (LLMs) perform on real-world software engineering tasks, inspired by the SWE-Bench benchmark. Using locally hosted models like Llama 3 via Ollama, the tool evaluates code repair capabilities on Python repositories through custom test cases and a lightweight scoring framework.

    Language:TeX