Maintained by WANG Yue (wangyue2714@gmail.com). Last update on 2021/12/17.
Learning and Evaluating Contextual Embedding of Source Code, [code] ICML 2020 (CuBERT)
CodeBERT:A Pre-Trained Model for Programming and Natural Languages, [code] EMNLP 2020 Findings, (CodeBERT)
GraphCodeBERT: Pre-training Code Representations with Data Flow, [code] ICLR 2021 (GraphCodeBERT)
Unified Pre-training for Program Understanding and Generation, [code] NAACL 2021 (PLBART)
Unsupervised Translation of Programming Languages, [code] NeurIPS 2020 (TransCoder)
Exploring Software Naturalness through Neural Language Models, arXiv 2020/06 (C-BERT)
PYMT5: multi-mode translation of natural language and PYTHON code with transformers, EMNLP 2020 (PYMT5)
Contrastive Code Representation Learning, [code] arXiv 2020/07 (ContraCode)
DOBF: A Deobfuscation Pre-Training Objective for Programming Languages, arXiv 2021/02 (DOBF)
Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks, [code] ICSE 2021
CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing, [code] arXiv 2021/04 (CodeTrans)
How could Neural Networks understand Programs?, [code] ICML 2021 (OSCAR)
CoTexT: Multi-task Learning with Code-Text Transformer, arXiv 2021/05 (CoTexT)
Disentangled Code Representation Learning for Multiple Programming Languages, ACL-Fingings 2021 (CODEDISEN)
SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation, arXiv 2021/09 (SYNCOBERT)
TreeBERT: A Tree-Based Pre-Trained Model for Programming Language, UAI 2021
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation, EMNLP 2021 [code] [blog] [media][slide][poster]
Code Completion: Multi-task Learning based Pre-trained Language Model for Code Completion, ASE 2020 (CugLM)
Code Completion: IntelliCode Compose: Code Generation using Transformer, FSE 2020 (IntelliCode Compose)
Code Completion: Improving Code Autocompletion with Transfer Learning, arXiv 2021/05
Program Repair: Generating Bug-Fixes Using Pretrained Transformers, arXiv 2021/04 (DeepCode)
Program Repair: DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons, arXiv 2021/05 (DeepDebug)
Program Repair: TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer, ICML 2021
Program Repair: CURE: Code-Aware Neural Machine Translation for Automatic Program Repair, ICSE 2021
Unit Test Generation: Unit Test Case Generation with Transformers and Focal Context, arXiv 2021/05
Code Generation: Evaluating Large Language Models Trained on Code, arXiv 2021/07 (Codex)
Code Generation: Program Synthesis with Large Language Models, arXiv 2021/08
Language-Agnostic Representation Learning of Source Code from Structure and Context, [code] ICLR 2021 (Code Transformer)
GN-Transformer: Fusing AST and Source Code information in Graph Networks, openreview 2020/09 (GN-Transformer)
Program Repair: HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS, ICLR 2020 (HOPPITY)
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation, [code] arXiv 2021/02
Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks [code]
Measuring Coding Challenge Competence With APPS, arXiv 2021/05