Awesome Learning-based APR

🔥🔥🔥[2024-05-03] We have released a new paper about LLM4APR, A Systematic Literature Review on Large Language Models for Automated Program Repair. Please refer to and

A collection of academic publications, methodology, metrics and datasets on the subject of automated program repair enhanced with deep/machine learning techniques.

We welcome all researchers to contribute to this repository and further contribute to the knowledge of the learning-based APR field. Please feel free to contact us if you have any related references by Github issue or pull request.

Citation

Please read and cite our paper:

@article{zhang2023survey,
  title = {A Survey of Learning-Based Automated Program Repair},
  author = {Zhang, Quanjun and Fang, Chunrong and Ma, Yuxiang and Sun, Weisong and Chen, Zhenyu},
  journal={ACM Transactions on Software Engineering and Methodology},
  volume={33},
  number={2},
  pages={1--69},
  year={2023},
  publisher={ACM New York, NY}
}

A Framework of Deep Learning-based APR

Collected Papers

Paper Title	Venue	Year	Code Available
Neural Transfer Learning for Repairing Security Vulnerabilities in C Code	TSE	2022	yes
Vulrepair: A T5-based Automated Software Vulnerability Repair	ESEC/FSE	2022	yes
Seqtrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning	TSE	2022	yes
Learning to Repair Software Vulnerabilities with Generative Adversarial Networks	NeurIPS	2018	no
Repairing Security Vulnerabilities Using Pre-trained Programming Language Models	DSNW	2022	no
Spvf: Security Property Assisted Vulnerability Fixing Via Attention-based Models	ESE	2022	no
Vurle: Automatic Vulnerability Detection and Repair by Learning from Examples	ESORICS	2017	no
Synshine: Improved Fixing of Syntax Errors	TSE	2022	yes
Deepfix: Fixing Common C Language Errors by Deep Learning	AAAI	2017	no
Break-it-fix-it: Unsupervised Learning for Program Repair	ICML	2021	yes
Tfix: Learning to Fix Coding Errors with a Text-to-text Transformer	ICML	2021	yes
Deepdelta: Learning to Repair Compilation Errors	ESE/FSE	2019	no
Syntax and Sensibility: Using Language Models to Detect and Correct Syntax Errors	SANER	2018	no
Learning Lenient Parsing & Typing Via Indirect Supervision	ESE	2021	yes
Deep Reinforcement Learning for Syntactic Error Repair in Student Programs	AAAI	2019	yes
Neuro-symbolic Program Corrector for Introductory Programming Assignments	ICSE	2018	no
Samplefix: Learning to Generate Functionally Diverse Fixes	ECML	2021	no
Ggf: A graph-based method for programming language syntax error correction	ICPC	2020	no
Compilation Error Repair: For the Student Programs, from the Student Programs	ICSE-SEET	2018	no
Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises	PLDI	2018	no
Dynamic Neural Program Embeddings for Program Repair	ICLR	2018	yes
Repairing Bugs in Python Assignments Using Large Language Models	arxiv	2021	no
Verifix: Verified Repair of Programming Assignments	TOSEM	2022	yes
Generating Concise Patches for Newly Released Programming Assignments	TSE	2022	no
Automated correction for syntax errors in programming assignments using recurrent neural networks	arxiv	2016	no
Improving Automatically Generated Code from Codex Via Automated Program Repair	arxiv	2022	no
Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis	NeurIPS	2020	no
Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning	ISC	2021	no
Getafix: Learning to Fix Bugs Automatically	OOPSLA	2019	no
A Software-repair Robot Based on Continual Learning	IEEE Software	2021	yes
Self-supervised Bug Detection and Repair	NeurIPS	2021	yes
Deepdebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons	arxiv	2021	no
Generating Bug-fixes Using Pretrained Transformers	MAPS	2021	no
Global Relational Models of Source Code	ICLR	2019	yes
Fix Bugs with Transformer through a Neural-symbolic Edit Grammar	arxiv	2022	no
Grammar-based Patches Generation for Automated Program Repair	ACL-IJCNLP	2021	no
Leveraging Causal Inference for Explainable Automatic Program Repair	arxiv	2022	no
Codebert: A Pre-trained Model for Programming and Natural Languages	EMNLP	2020	yes
Codet5: Identifier-aware Unified Pre-trained Encoder-decoder Models for Code Understanding and Generation	EMNLP	2021	yes
Graphcodebert: Pre-training Code Representations with Data Flow	ICLR	2021	yes
Using Transfer Learning for Code-related Tasks	TSE	2022	yes
Spt-code: Sequence-to-sequence Pre-training for Learning the Representation of Source Code	ICSE	2022	yes
Circle: Continual Repair across Programming Languages	ISSTA	2022	yes
On Multi-modal Learning of Editing Source Code	ASE	2021	no
Patch Generation with Language Models: Feasibility and Scaling Behavior	ICLR-DL4C	2022	no
Can We Learn from Developer Mistakes? Learning to Localize and Repair Real Bugs from Real Bug Fixes	arxiv	2022	yes
Coditt5: Pretraining for Source Code and Natural Language Editing	ASE	2022	yes
Applying Codebert for Automated Program Repair of Java Simple Bugs	MSR	2021	yes
Towards Javascript Program Repair with Generative Pre-trained Transformer (gpt-2)	APR	2022	yes
Can Openai's Codex Fix Bugs? An Evaluation on Quixbugs	APR	2022	no
Glad: Neural Predicate Synthesis to Repair Omission Faults	arxiv	2022	no
Less Training, More Repairing Please: Revisiting Automated Program Repair Via Zero-shot Learning	ESEC/FSE	2022	no
Automatic Patch Generation by Learning Correct Code	POPL	2016	no
Program Repair with Repeated Learning	TSE	2022	no
Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge	ICSE	2022	yes
Improving Search-based Automatic Program Repair with Neural Machine Translation	IEEE Access	2022	no
Siturepair: Incorporating Machine-learning Fault Class Prediction to Inform Situational Multiple Fault Automatic Program Repair	IJCIP	2022	no
Dear: A Novel Deep Learning-based Approach for Automated Program Repair	ICSE	2022	yes
Graphix: A Pre-trained Graph Edit Model for Automated Program Repair	__	2021	no
Can We Automatically Fix Bugs by Learning Edit Operations?	SANER	2022	yes
Language Models Can Prioritize Patches for Practical Program Patching	APR	2022	no
Predicting Patch Correctness Based on the Similarity of Failing Test Cases	TOSEM	2022	yes
Defect Identification, Categorization, and Repair: Better Together	arxiv	2022	yes
Selfapr: Self-supervised Program Repair with Test Execution Diagnostics	ASE	2022	yes
Crex: Predicting Patch Correctness in Automated Repair of C Programs through Transfer Learning of Execution Semantics	IST	2022	yes
Bug-transformer: Automated Program Repair Using Attention-based Deep Neural Network	JCSC	2022	no
Katana: Dual Slicing-based Context for Learning Bug Fixes	arxiv	2022	no
Oapr-homl'1: Optimal Automated Program Repair Approach Based on Hybrid Improved Grasshopper Optimization and Opposition Learning Based Artificial Neural Network	IJCSDS	2022	no
Repair Is Nearly Generation: Multilingual Program Repair with Llms	arxiv	2022	no
Practical Program Repair in the Era of Large Pre-trained Language Models	arxiv	2022	no
Context-aware Code Change Embedding for Better Patch Correctness Assessment	TOSEM	2022	yes
Attention: Not Just Another Dataset for Patch-correctness Checking	arxiv	2022	yes
Is This Change the Answer to That Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness	ASE	2022	yes
Patch Correctness Assessment in Automated Program Repair Based on the Impact of Patches on Production and Test Code	ISSTA	2022	yes
The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches	TOSEM	2022	yes
M3v: Multi-modal Multi-view Context Embedding for Repair Operator Prediction	CGO	2022	no
Identifying Incorrect Patches in Program Repair Based on Meaning of Source Code	IEEE Access	2022	yes
Towards Boosting Patch Execution On-the-fly	ICSE	2022	no
Cure: Code-aware Neural Machine Translation for Automatic Program Repair	ICSE	2021	no
A Syntax-guided Edit Decoder for Neural Program Repair	ESEC/FSE	2021	yes
Sequencer: Sequence-to-sequence Learning for End-to-end Program Repair	TSE	2019	yes
A Controlled Experiment of Different Code Representations for Learning-based Program Repair	ESE	2022	yes
Neural Program Repair with Execution-based Backpropagation	ICSE	2022	yes
A Bidirectional Lstm Language Model for Code Evaluation and Repair	SYM	2021	no
Grasp: Graph-to-sequence Learning for Automated Program Repair	QRS	2021	no
Jointly Learning to Repair Code and Generate Commit Message	arxiv	2021	no
Evaluating Large Language Models Trained on Code	arxiv	2021	yes
Automated Classification of Overfitting Patches with Statically Extracted Code Features	TSE	2022	yes
Application of Seq2seq Models on Code Correction	FRAI	2021	yes
Exploring Plausible Patches using Source Code Embeddings in Javascript	APR	2022	no
Fast and Precise On-the-fly Patch Validation for All	ICSE	2021	no
How Does Regression Test Selection Affect Program Repair? An Extensive Study on 2 Million Patches	arxiv	2021	no
Coconut: Combining Context-aware Neural Translation Models Using Ensemble for Program Repair	ISSTA	2020	yes
Dlfix: Context-based Code Transformation Learning for Automated Program Repair	ICSE	2020	yes
Graph-based, Self-supervised Program Repair from Diagnostic Feedback	ICML	2020	yes
Human-in-the-loop Automatic Program Repair	ICST	2020	yes
Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs	ICLR	2020	no
Applying Deep Learning Algorithm to Automatic Bug Localization and Repair	SAC	2020	no
Evaluating Representation Learning of Code Changes for Predicting Patch Correctness in Program Repair	ASE	2020	yes
Patching As Translation: The Data and the Metaphor	ASE	2020	yes
Codit: Code Editing with Tree-based Neural Models	TSE	2022	no
Learning to Generate Corrective Patches Using Neural Machine Translation	arxiv	2018	no
On Learning Meaningful Code Changes Via Neural Machine Translation	ICSE	2019	yes
Sorting and Transforming Program Repair Ingredients Via Deep Learning Code Similarities	SANER	2019	no
An Empirical Study on Learning Bug-fixing Patches in the Wild Via Neural Machine Translation	TOSEM	2019	yes
Automatic Repair and Type Binding of Undeclared Variables Using Neural Networks	__	2019	no
Encore: Ensemble Learning Using Convolution Neural Machine Translation for Automatic Program Repair	arxiv	2019	no
Neural Program Repair by Jointly Learning to Localize and Repair	arxiv	2019	no
Sequence to Sequence Machine Learning for Automatic Program Repair	__	2019	no
Learning the Relation between Code Features and Code Transforms with Structured Prediction	arxiv	2019	no
Learning to Represent Edits	arxiv	2018	no
Dynamic Neural Program Embedding for Program Repair	arxiv	2017	yes
Semantic Code Repair Using Neuro-symbolic Transformation Networks	arxiv	2017	yes
Automated Program Repair Using Genetic Programming and Model Checking	Applied Intelligence	2016	no
	arxiv	2019	no

Datasets

Some datasets are named after the model or author because these datasets do not have names

Dataset	Language	#Items	Test Case	#Papers Used
Bears	Java	251	yes	2+ papers
BFP medium	Java	65454	no	9+ papers
BFP samll	Java	58350	no	9+ papers
BigFix	Java	1.824 M	no	2+ papers
Bugs2Fix	Java	92849	no	2+ papers
Bugs.jar	Java	1158	yes	3+ papers
Code-Change-Data	Java	44372	no	1+ papers
CodeXGlue	Java	122 K	no	1+ papers
CodRep	Java	58069	no	2+ papers
CPatMiner	Java	44 K	no	1+ papers
DeepRepair	Java	374	no	1+ papers
Defects4J	Java	835	yes	11+ papers
Function-SStuBs4J	Java	21047	no	1+ papers
IntroClassJava	Java	998	yes	2+ papers
Java-med	Java	7454	no	1+ papers
ManySStuBs4J large	Java	63923	no	1+ papers
ManySStuBs4J small	Java	10231	no	2+ papers
MegaDiff	Java	663029	no	1+ papers
Ponta	Java	624	no	1+ papers
Pull-Request-Data	Java	10666	no	2+ papers
Ratchet	Java	35 K	no	1+ papers
Recoder	Java	103585	no	1+ papers
TRANSFER	Java	408091	no	1+ papers
Mesbah	Java	4.8 M	no	1+ papers
AOJ	C	2482	no	1+ papers
Big-Vul	C	3745	no	1+ papers
Code4Bench	C	25 K	yes	1+ papers
CodeHunt	C	195 K	yes	1+ papers
CVEFixes	C		8482	2+ papers
DeepFix	C	6971	yes	6+ papers
ManyBugs	C	185	yes	3+ papers
Prophet	C	69	yes	2+ papers
Prutor	C	6971	yes	2+ papers
BugAID	JS	105133	no	4+ papers
BugsJS	JS	453	yes	1+ papers
HOPPITY	JS	363 K	no	1+ papers
KATANA	JS	114 K	no	1+ papers
REPTORY	JS	407 K	no	1+ papers
TFix	JS	100 K	no	1+ papers
ETH Py150	Python	150 K	no	3+ papers
GitHub-Python	Python	3 M	no	1+ papers
Mester	Python	13 K	no	1+ papers
PyPIBug	Python	2374	no	2+ papers
SSB-9M	Python	9 M	no	1+ papers
VUDENC	Python	10 K	no	1+ papers
Chhatbar	Python	286	yes	1+ papers
SPoC	C++	18356	yes	1+ papers
QuixBugs	Java,Python	40	yes	11+ papers
DeepDebug	Java,Python	523	no	2+ papers
MSR20	C,C++	188K	no	1+ papers
CoCoNut	Java,C,JS,Python	24 M	yes	4+ papers
CodeFlaw	C,Python	3902	yes	3+ papers
ENCORE	Java,C++,JS,Python	9.2 M	no	1+ papers

Evaluation Metrics

Metrics	Formula/Description
Accuracy	Accuracy measures the percentage of candidate patches in which the sequence predicted by the model equals the ground truth
BLEU	BLUE score measures how similar the predicted candidate patch and the ground truth is
Compilable Patch	Such a candidate patch makes the patched buggy program compile successfully
Plausible Patch	Such a compilable patch fixes the buggy functionality without harming existing functionality (i.e., passing all available test suites)
Correct Patch	Such a plausible patch is semantically or syntactically equivalent to the developer patch (i.e., generalizing the potential test suite)