/AwesomeLearningAPR

[TOSEM 2023] A Survey of Learning-based Automated Program Repair

MIT LicenseMIT

Awesome Learning-based APR

Awesome arXiv GitHub stars visitor badge

🔥🔥🔥[2024-05-03] We have released a new paper about LLM4APR, A Systematic Literature Review on Large Language Models for Automated Program Repair. Please refer to and

A collection of academic publications, methodology, metrics and datasets on the subject of automated program repair enhanced with deep/machine learning techniques.

We welcome all researchers to contribute to this repository and further contribute to the knowledge of the learning-based APR field. Please feel free to contact us if you have any related references by Github issue or pull request.

Citation

Please read and cite our paper: arXiv

@article{zhang2023survey,
  title = {A Survey of Learning-Based Automated Program Repair},
  author = {Zhang, Quanjun and Fang, Chunrong and Ma, Yuxiang and Sun, Weisong and Chen, Zhenyu},
  journal={ACM Transactions on Software Engineering and Methodology},
  volume={33},
  number={2},
  pages={1--69},
  year={2023},
  publisher={ACM New York, NY}
}

A Framework of Deep Learning-based APR

dlapr

Collected Papers

Paper Title Venue Year Code Available
Neural Transfer Learning for Repairing Security Vulnerabilities in C Code TSE 2022 yes
Vulrepair: A T5-based Automated Software Vulnerability Repair ESEC/FSE 2022 yes
Seqtrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning TSE 2022 yes
Learning to Repair Software Vulnerabilities with Generative Adversarial Networks NeurIPS 2018 no
Repairing Security Vulnerabilities Using Pre-trained Programming Language Models DSNW 2022 no
Spvf: Security Property Assisted Vulnerability Fixing Via Attention-based Models ESE 2022 no
Vurle: Automatic Vulnerability Detection and Repair by Learning from Examples ESORICS 2017 no
Synshine: Improved Fixing of Syntax Errors TSE 2022 yes
Deepfix: Fixing Common C Language Errors by Deep Learning AAAI 2017 no
Break-it-fix-it: Unsupervised Learning for Program Repair ICML 2021 yes
Tfix: Learning to Fix Coding Errors with a Text-to-text Transformer ICML 2021 yes
Deepdelta: Learning to Repair Compilation Errors ESE/FSE 2019 no
Syntax and Sensibility: Using Language Models to Detect and Correct Syntax Errors SANER 2018 no
Learning Lenient Parsing & Typing Via Indirect Supervision ESE 2021 yes
Deep Reinforcement Learning for Syntactic Error Repair in Student Programs AAAI 2019 yes
Neuro-symbolic Program Corrector for Introductory Programming Assignments ICSE 2018 no
Samplefix: Learning to Generate Functionally Diverse Fixes ECML 2021 no
Ggf: A graph-based method for programming language syntax error correction ICPC 2020 no
Compilation Error Repair: For the Student Programs, from the Student Programs ICSE-SEET 2018 no
Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises PLDI 2018 no
Dynamic Neural Program Embeddings for Program Repair ICLR 2018 yes
Repairing Bugs in Python Assignments Using Large Language Models arxiv 2021 no
Verifix: Verified Repair of Programming Assignments TOSEM 2022 yes
Generating Concise Patches for Newly Released Programming Assignments TSE 2022 no
Automated correction for syntax errors in programming assignments using recurrent neural networks arxiv 2016 no
Improving Automatically Generated Code from Codex Via Automated Program Repair arxiv 2022 no
Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis NeurIPS 2020 no
Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning ISC 2021 no
Getafix: Learning to Fix Bugs Automatically OOPSLA 2019 no
A Software-repair Robot Based on Continual Learning IEEE Software 2021 yes
Self-supervised Bug Detection and Repair NeurIPS 2021 yes
Deepdebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons arxiv 2021 no
Generating Bug-fixes Using Pretrained Transformers MAPS 2021 no
Global Relational Models of Source Code ICLR 2019 yes
Fix Bugs with Transformer through a Neural-symbolic Edit Grammar arxiv 2022 no
Grammar-based Patches Generation for Automated Program Repair ACL-IJCNLP 2021 no
Leveraging Causal Inference for Explainable Automatic Program Repair arxiv 2022 no
Codebert: A Pre-trained Model for Programming and Natural Languages EMNLP 2020 yes
Codet5: Identifier-aware Unified Pre-trained Encoder-decoder Models for Code Understanding and Generation EMNLP 2021 yes
Graphcodebert: Pre-training Code Representations with Data Flow ICLR 2021 yes
Using Transfer Learning for Code-related Tasks TSE 2022 yes
Spt-code: Sequence-to-sequence Pre-training for Learning the Representation of Source Code ICSE 2022 yes
Circle: Continual Repair across Programming Languages ISSTA 2022 yes
On Multi-modal Learning of Editing Source Code ASE 2021 no
Patch Generation with Language Models: Feasibility and Scaling Behavior ICLR-DL4C 2022 no
Can We Learn from Developer Mistakes? Learning to Localize and Repair Real Bugs from Real Bug Fixes arxiv 2022 yes
Coditt5: Pretraining for Source Code and Natural Language Editing ASE 2022 yes
Applying Codebert for Automated Program Repair of Java Simple Bugs MSR 2021 yes
Towards Javascript Program Repair with Generative Pre-trained Transformer (gpt-2) APR 2022 yes
Can Openai's Codex Fix Bugs? An Evaluation on Quixbugs APR 2022 no
Glad: Neural Predicate Synthesis to Repair Omission Faults arxiv 2022 no
Less Training, More Repairing Please: Revisiting Automated Program Repair Via Zero-shot Learning ESEC/FSE 2022 no
Automatic Patch Generation by Learning Correct Code POPL 2016 no
Program Repair with Repeated Learning TSE 2022 no
Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge ICSE 2022 yes
Improving Search-based Automatic Program Repair with Neural Machine Translation IEEE Access 2022 no
Siturepair: Incorporating Machine-learning Fault Class Prediction to Inform Situational Multiple Fault Automatic Program Repair IJCIP 2022 no
Dear: A Novel Deep Learning-based Approach for Automated Program Repair ICSE 2022 yes
Graphix: A Pre-trained Graph Edit Model for Automated Program Repair __ 2021 no
Can We Automatically Fix Bugs by Learning Edit Operations? SANER 2022 yes
Language Models Can Prioritize Patches for Practical Program Patching APR 2022 no
Predicting Patch Correctness Based on the Similarity of Failing Test Cases TOSEM 2022 yes
Defect Identification, Categorization, and Repair: Better Together arxiv 2022 yes
Selfapr: Self-supervised Program Repair with Test Execution Diagnostics ASE 2022 yes
Crex: Predicting Patch Correctness in Automated Repair of C Programs through Transfer Learning of Execution Semantics IST 2022 yes
Bug-transformer: Automated Program Repair Using Attention-based Deep Neural Network JCSC 2022 no
Katana: Dual Slicing-based Context for Learning Bug Fixes arxiv 2022 no
Oapr-homl'1: Optimal Automated Program Repair Approach Based on Hybrid Improved Grasshopper Optimization and Opposition Learning Based Artificial Neural Network IJCSDS 2022 no
Repair Is Nearly Generation: Multilingual Program Repair with Llms arxiv 2022 no
Practical Program Repair in the Era of Large Pre-trained Language Models arxiv 2022 no
Context-aware Code Change Embedding for Better Patch Correctness Assessment TOSEM 2022 yes
Attention: Not Just Another Dataset for Patch-correctness Checking arxiv 2022 yes
Is This Change the Answer to That Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness ASE 2022 yes
Patch Correctness Assessment in Automated Program Repair Based on the Impact of Patches on Production and Test Code ISSTA 2022 yes
The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches TOSEM 2022 yes
M3v: Multi-modal Multi-view Context Embedding for Repair Operator Prediction CGO 2022 no
Identifying Incorrect Patches in Program Repair Based on Meaning of Source Code IEEE Access 2022 yes
Towards Boosting Patch Execution On-the-fly ICSE 2022 no
Cure: Code-aware Neural Machine Translation for Automatic Program Repair ICSE 2021 no
A Syntax-guided Edit Decoder for Neural Program Repair ESEC/FSE 2021 yes
Sequencer: Sequence-to-sequence Learning for End-to-end Program Repair TSE 2019 yes
A Controlled Experiment of Different Code Representations for Learning-based Program Repair ESE 2022 yes
Neural Program Repair with Execution-based Backpropagation ICSE 2022 yes
A Bidirectional Lstm Language Model for Code Evaluation and Repair SYM 2021 no
Grasp: Graph-to-sequence Learning for Automated Program Repair QRS 2021 no
Jointly Learning to Repair Code and Generate Commit Message arxiv 2021 no
Evaluating Large Language Models Trained on Code arxiv 2021 yes
Automated Classification of Overfitting Patches with Statically Extracted Code Features TSE 2022 yes
Application of Seq2seq Models on Code Correction FRAI 2021 yes
Exploring Plausible Patches using Source Code Embeddings in Javascript APR 2022 no
Fast and Precise On-the-fly Patch Validation for All ICSE 2021 no
How Does Regression Test Selection Affect Program Repair? An Extensive Study on 2 Million Patches arxiv 2021 no
Coconut: Combining Context-aware Neural Translation Models Using Ensemble for Program Repair ISSTA 2020 yes
Dlfix: Context-based Code Transformation Learning for Automated Program Repair ICSE 2020 yes
Graph-based, Self-supervised Program Repair from Diagnostic Feedback ICML 2020 yes
Human-in-the-loop Automatic Program Repair ICST 2020 yes
Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs ICLR 2020 no
Applying Deep Learning Algorithm to Automatic Bug Localization and Repair SAC 2020 no
Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair ASE 2020 yes
Patching As Translation: The Data and the Metaphor ASE 2020 yes
Codit: Code Editing with Tree-based Neural Models TSE 2022 no
Learning to Generate Corrective Patches Using Neural Machine Translation arxiv 2018 no
On Learning Meaningful Code Changes Via Neural Machine Translation ICSE 2019 yes
Sorting and Transforming Program Repair Ingredients Via Deep Learning Code Similarities SANER 2019 no
An Empirical Study on Learning Bug-fixing Patches in the Wild Via Neural Machine Translation TOSEM 2019 yes
Automatic Repair and Type Binding of Undeclared Variables Using Neural Networks __ 2019 no
Encore: Ensemble Learning Using Convolution Neural Machine Translation for Automatic Program Repair arxiv 2019 no
Neural Program Repair by Jointly Learning to Localize and Repair arxiv 2019 no
Sequence to Sequence Machine Learning for Automatic Program Repair __ 2019 no
Learning the Relation between Code Features and Code Transforms with Structured Prediction arxiv 2019 no
Learning to Represent Edits arxiv 2018 no
Dynamic Neural Program Embedding for Program Repair arxiv 2017 yes
Semantic Code Repair Using Neuro-symbolic Transformation Networks arxiv 2017 yes
Automated Program Repair Using Genetic Programming and Model Checking Applied Intelligence 2016 no
arxiv 2019 no

Datasets

  • Some datasets are named after the model or author because these datasets do not have names
Dataset Language #Items Test Case #Papers Used
Bears Java 251 yes 2+ papers
BFP medium Java 65454 no 9+ papers
BFP samll Java 58350 no 9+ papers
BigFix Java 1.824 M no 2+ papers
Bugs2Fix Java 92849 no 2+ papers
Bugs.jar Java 1158 yes 3+ papers
Code-Change-Data Java 44372 no 1+ papers
CodeXGlue Java 122 K no 1+ papers
CodRep Java 58069 no 2+ papers
CPatMiner Java 44 K no 1+ papers
DeepRepair Java 374 no 1+ papers
Defects4J Java 835 yes 11+ papers
Function-SStuBs4J Java 21047 no 1+ papers
IntroClassJava Java 998 yes 2+ papers
Java-med Java 7454 no 1+ papers
ManySStuBs4J large Java 63923 no 1+ papers
ManySStuBs4J small Java 10231 no 2+ papers
MegaDiff Java 663029 no 1+ papers
Ponta Java 624 no 1+ papers
Pull-Request-Data Java 10666 no 2+ papers
Ratchet Java 35 K no 1+ papers
Recoder Java 103585 no 1+ papers
TRANSFER Java 408091 no 1+ papers
Mesbah Java 4.8 M no 1+ papers
AOJ C 2482 no 1+ papers
Big-Vul C 3745 no 1+ papers
Code4Bench C 25 K yes 1+ papers
CodeHunt C 195 K yes 1+ papers
CVEFixes C 8482 2+ papers
DeepFix C 6971 yes 6+ papers
ManyBugs C 185 yes 3+ papers
Prophet C 69 yes 2+ papers
Prutor C 6971 yes 2+ papers
BugAID JS 105133 no 4+ papers
BugsJS JS 453 yes 1+ papers
HOPPITY JS 363 K no 1+ papers
KATANA JS 114 K no 1+ papers
REPTORY JS 407 K no 1+ papers
TFix JS 100 K no 1+ papers
ETH Py150 Python 150 K no 3+ papers
GitHub-Python Python 3 M no 1+ papers
Mester Python 13 K no 1+ papers
PyPIBug Python 2374 no 2+ papers
SSB-9M Python 9 M no 1+ papers
VUDENC Python 10 K no 1+ papers
Chhatbar Python 286 yes 1+ papers
SPoC C++ 18356 yes 1+ papers
QuixBugs Java,Python 40 yes 11+ papers
DeepDebug Java,Python 523 no 2+ papers
MSR20 C,C++ 188K no 1+ papers
CoCoNut Java,C,JS,Python 24 M yes 4+ papers
CodeFlaw C,Python 3902 yes 3+ papers
ENCORE Java,C++,JS,Python 9.2 M no 1+ papers

Evaluation Metrics

Metrics Formula/Description
Accuracy Accuracy measures the percentage of candidate patches in which the sequence predicted by the model equals the ground truth
BLEU BLUE score measures how similar the predicted candidate patch and the ground truth is
Compilable Patch Such a candidate patch makes the patched buggy program compile successfully
Plausible Patch Such a compilable patch fixes the buggy functionality without harming existing functionality (i.e., passing all available test suites)
Correct Patch Such a plausible patch is semantically or syntactically equivalent to the developer patch (i.e., generalizing the potential test suite)