Awesome-LLM-for-SE&Sec-Papers

A curated list of Large Language Model for Software Engineering and Security papers.

1. SE/PL Papers

1.1 Software Testing

Code-Aware Prompting: A study of Coverage guided Test Generation in Regression Setting using LLM FSE 2024. [pdf]
Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot ICSE 2024. [pdf]
Enhancing Exploratory Testing by Large Language Model and Knowledge Graph. ICSE 2024. [pdf]
Fuzz4All: Universal Fuzzing with Large Language Models. ICSE 2024. [pdf]
Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries. ICSE 2024. [pdf]
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. ICSE 2024. [pdf]
Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. ICSE 2024. [pdf]
Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. ASE 2023. [pdf]
SMT Solver Validation Empowered by Large Pre-trained Language Models. ASE 2023. [pdf]
Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. Fuzzing, Deep Learning Library Testing ISSTA 2023. [pdf]
CodaMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. Search-based Testing ICSE 2023. [pdf]
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction. Unit Test Generation, Bug Reproduction ICSE 2023. [pdf]
Understanding Large Language Model-Based Fuzz Driver Generation. Fuzzing Driver Generation arxiv. [pdf]
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. Fuzzing, Deep Learning Library Testing arxiv. [pdf]

1.2 Automated Program Repair

A Deep Dive into Large Language Models for Automated Bug Localization and Repair. FSE 2024.[pdf]
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors. ICSE 2024. [pdf]
Automated Program Repair in the Era of Large Pre-trained Language Models. LLM for APR ICSE 2023. [pdf]
An Empirical Study on Fine-tuning Large Language Models of Code for Automated Program Repair. ASE 2023 [pdf][slides]
Conversational Automated Program Repair. arxiv. [pdf]

1.3 Automated Bug Replay

Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. S2R Identification, UI Encoding ICSE 2024. [pdf]

1.4 Root Cause Analysis and Fault Management

A Quantitative and Qualitative Evaluation of LLM-based Explainable Fault Localization. FSE 2024. [pdf]
Xpert: Empowering Incident Management with Query Recommendations via Large Language Models. ICSE 2024. [pdf]
Automatic Root Cause Analysis via Large Language Models for Cloud Incidents Eurosys 2024. [pdf]
Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models. Fine-Tuning, Maintainer Interview ICSE 2023. [pdf]

1.5 Code Summary

An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation. Code Intent Summary, Retrieval-based Prompting ICSE 2024. [pdf]
Using an LLM to Help With Code Understanding ICSE 2024. [pdf]

1.6 Code Quality Assurance

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code. ICSE 2024. [pdf]
Traces of Memorisation in Large Language Models for Code. ICSE 2024. [pdf]
Automated Repair of Programs from Large Language Models. APR for LLM-Generated Program ICSE 2023. [pdf]
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. Code Correctness Validation arxiv 2023. [pdf]
Large Language Models of Code Fail at Completing Code with Potential Bugs. arxiv. [pdf]
Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation. arxiv. [pdf]
RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot. arxiv. [pdf]

1.7 Static Analysis

A Learning-Based Approach to Static Program Slicing. OOPSLA 2024. [pdf]
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. OOPSLA 2024. [pdf]
Demystifying and Detecting Misuses of Deep Learning APIs. ICSE 2024. [pdf]
Assisting Static Analysis with Large Language Models: A ChatGPT Experiment. FSE 2024. [pdf]
Generative Type Inference for Python. ASE 2023. [pdf]

1.8 Code Generation

ClarifyGPT: A Framework for Enhancing LLM-based Code Generation via Requirements Clarification. FSE 2024. [pdf]
DiffCoder: Enhancing Large Language Model on API Invocation via Analogical Code Exercises. FSE 2024. [pdf]
Evaluating Large Language Models in Class-Level Code Generation. ICSE 2024. [pdf]

1.9 Log Analysis

LILAC: Log Parsing using LLMs with Adaptive Parsing Cache. ICSE 2024. [pdf]
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing. ICSE 2024. [pdf]
UniLog: Automatic Logging via LLM and In-Context Learning ICSE 2024. [pdf]

1.10 Formalization

Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? FSE 2024. [pdf]

1.11 Maintenance

Only diff is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model FSE 2024. [pdf]

2. Security Papers

2.1 Malware

GPThreats-3: Is Automatic Malware Generation a Threat? Malware Generation WOOT 2023. [pdf] [slides]

2.2 Vulnerability Repair

Examining Zero-Shot Vulnerability Repair with Large Language Models. Vulnerability Repair Oakland 2023. [pdf]

2.3 Vulnerability Analysis

LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?):A Comprehensive Evaluation, Framework, and Benchmarks. Oakland 2024. [pdf]
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. Code Defect Analysis USENIX Security 2023. [pdf]

2.4 Vulnerability Detection

Large Language Models for Test-Free Fault Localization. ICSE 2024. [pdf]
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models. Code Defect Analysis arxiv. [pdf]
Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? Program analysis supported by LLM arxiv. [pdf]

2.5 Fuzzing

LLMIF: Augmented Large Language Model for Fuzzing IoT Devices. Oakland 2024. [pdf]
Large Language Model guided Protocol Fuzzing. NDSS 2024. [pdf]
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models. arxiv. [pdf]

2.6 Cyber Attack

From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models. Oakland 2024. [pdf]

Contributing

This list is mainly maintained by Yifan Xia from NESA Lab.

We sincerely welcome contributors for contributing to this repository!

Markdown format

**Paper Name**. Conference Year. `Keywords` [[pdf](pdf_link)] [[code](code_link)]

Licenses

To the extent possible under law, Anderson-Xia all copyright and related or neighboring rights to this repository.

EvanXiaa/Awesome-LLM_For-SE-Sec-Papers