Papers for Repo-Level Code Generation

This repo maintains the list of papers for repo-level code generation.

Feel free to create pull request to add more.

Papers

  1. 06/2024: Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback
    • iteratively retrieve the context based on the compiler feedback
  2. 06/2024: GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model
    • Slicing the blocks/lines of code as retrieved context
  3. 06/2024: R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
    • A repo-level completion benchmark and a context retrieval and prompt assemble powered code completion framework.
  4. 06/2024: Enhancing Repository-Level Code Generation with Integrated Contextual Information
    • Designed for statically typed programming languages. Integrate relevant code and type context.
  5. 05/2024: Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion
    • ACL-2024 accepted: extract entities and relations formalism, to obtain the context graph
  6. 03/2024: Repoformer: Selective Retrieval for Repository-Level Code Completion
    • A pre-training approach to tackle repo-level code retrieval
  7. 02/2024: Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context
    • Leverage the IDE cross-file information for LLM to perform repo-level code generation.
  8. 01/2024: CODEAGENT: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges (from software engieerning)
    • propose a benchmark for evaluation.
    • Method: use tool to retrieve, rather than similarity
  9. 12/2023: Context-Aware Code Generation Framework for Code Repositories: Local, Global, and Third-Party Library Awareness (from software engieerning)
    • Focus on enhancing the retrieval process (based on GPT-3.5-Turbo)
  10. 11/2023: Guiding Language Models of Code with Global Context using Monitors
    • Maintain a monitor while performing repo-level code generation
  11. 10/2023: CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
    • benchmark that required cross-file reasoning
  12. 09/2023: CodePlan: Repository-level Coding using LLMs and Planning
    • plan first, then execute
  13. 06/2023: RepoFusion: Training Code Models to Understand Your Repository
    • trained to understand the whole repo
  14. 03/2023: RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation (EMNLP 2023)
    • Iteratively retrieve code from repo based on similary, until the code is correct
  15. 03/2023: InferFix: End-to-End Program Repair with LLMs (ICSE)
    • query the database to retrieve
  16. 06/2022: Repository-Level Prompt Generation for Large Language Models of Code (ICML 2023)
    • generate prompt based on the complete repo, classify from a list of prompt proposal

Survey Papers

  1. 06/2024: A Survey on Large Language Models for Code Generation
  2. 01/2024: If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Relevant Agent-based Papers

  1. 08/2023: MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
  2. 07/2023: Communicative Agents for Software Development