Awesome-LLM-for-SE&Sec-Papers

Awesome PRs Welcome

A curated list of Large Language Model for Software Engineering and Security papers.

Contents:

1. SE/PL Papers

1.1 Software Testing

  1. Code-Aware Prompting: A study of Coverage guided Test Generation in Regression Setting using LLM FSE 2024. [pdf]
  2. Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot ICSE 2024. [pdf]
  3. Enhancing Exploratory Testing by Large Language Model and Knowledge Graph. ICSE 2024. [pdf]
  4. Fuzz4All: Universal Fuzzing with Large Language Models. ICSE 2024. [pdf]
  5. Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries. ICSE 2024. [pdf]
  6. Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. ICSE 2024. [pdf]
  7. Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. ICSE 2024. [pdf]
  8. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. ASE 2023. [pdf]
  9. SMT Solver Validation Empowered by Large Pre-trained Language Models. ASE 2023. [pdf]
  10. Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. Fuzzing, Deep Learning Library Testing ISSTA 2023. [pdf]
  11. CodaMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. Search-based Testing ICSE 2023. [pdf]
  12. Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction. Unit Test Generation, Bug Reproduction ICSE 2023. [pdf]
  13. Understanding Large Language Model-Based Fuzz Driver Generation. Fuzzing Driver Generation arxiv. [pdf]
  14. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT. Fuzzing, Deep Learning Library Testing arxiv. [pdf]

1.2 Automated Program Repair

  1. A Deep Dive into Large Language Models for Automated Bug Localization and Repair. FSE 2024.[pdf]
  2. Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors. ICSE 2024. [pdf]
  3. Automated Program Repair in the Era of Large Pre-trained Language Models. LLM for APR ICSE 2023. [pdf]
  4. An Empirical Study on Fine-tuning Large Language Models of Code for Automated Program Repair. ASE 2023 [pdf][slides]
  5. Conversational Automated Program Repair. arxiv. [pdf]

1.3 Automated Bug Replay

  1. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. S2R Identification, UI Encoding ICSE 2024. [pdf]

1.4 Root Cause Analysis and Fault Management

  1. A Quantitative and Qualitative Evaluation of LLM-based Explainable Fault Localization. FSE 2024. [pdf]
  2. Xpert: Empowering Incident Management with Query Recommendations via Large Language Models. ICSE 2024. [pdf]
  3. Automatic Root Cause Analysis via Large Language Models for Cloud Incidents Eurosys 2024. [pdf]
  4. Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models. Fine-Tuning, Maintainer Interview ICSE 2023. [pdf]

1.5 Code Summary

  1. An Empirical Study on Using Large Language Models for Multi-Intent Comment Generation. Code Intent Summary, Retrieval-based Prompting ICSE 2024. [pdf]
  2. Using an LLM to Help With Code Understanding ICSE 2024. [pdf]

1.6 Code Quality Assurance

  1. Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code. ICSE 2024. [pdf]
  2. Traces of Memorisation in Large Language Models for Code. ICSE 2024. [pdf]
  3. Automated Repair of Programs from Large Language Models. APR for LLM-Generated Program ICSE 2023. [pdf]
  4. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. Code Correctness Validation arxiv 2023. [pdf]
  5. Large Language Models of Code Fail at Completing Code with Potential Bugs. arxiv. [pdf]
  6. Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation. arxiv. [pdf]
  7. RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot. arxiv. [pdf]

1.7 Static Analysis

  1. A Learning-Based Approach to Static Program Slicing. OOPSLA 2024. [pdf]
  2. Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. OOPSLA 2024. [pdf]
  3. Demystifying and Detecting Misuses of Deep Learning APIs. ICSE 2024. [pdf]
  4. Assisting Static Analysis with Large Language Models: A ChatGPT Experiment. FSE 2024. [pdf]
  5. Generative Type Inference for Python. ASE 2023. [pdf]

1.8 Code Generation

  1. ClarifyGPT: A Framework for Enhancing LLM-based Code Generation via Requirements Clarification. FSE 2024. [pdf]

  2. DiffCoder: Enhancing Large Language Model on API Invocation via Analogical Code Exercises. FSE 2024. [pdf]

  3. Evaluating Large Language Models in Class-Level Code Generation. ICSE 2024. [pdf]

1.9 Log Analysis

  1. LILAC: Log Parsing using LLMs with Adaptive Parsing Cache. ICSE 2024. [pdf]
  2. LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing. ICSE 2024. [pdf]
  3. UniLog: Automatic Logging via LLM and In-Context Learning ICSE 2024. [pdf]

1.10 Formalization

  1. Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? FSE 2024. [pdf]

1.11 Maintenance

  1. Only diff is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model FSE 2024. [pdf]

2. Security Papers

2.1 Malware

  1. GPThreats-3: Is Automatic Malware Generation a Threat? Malware Generation WOOT 2023. [pdf] [slides]

2.2 Vulnerability Repair

  1. Examining Zero-Shot Vulnerability Repair with Large Language Models. Vulnerability Repair Oakland 2023. [pdf]

2.3 Vulnerability Analysis

  1. LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?):A Comprehensive Evaluation, Framework, and Benchmarks. Oakland 2024. [pdf]
  2. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. Code Defect Analysis USENIX Security 2023. [pdf]

2.4 Vulnerability Detection

  1. Large Language Models for Test-Free Fault Localization. ICSE 2024. [pdf]
  2. The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models. Code Defect Analysis arxiv. [pdf]
  3. Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? Program analysis supported by LLM arxiv. [pdf]

2.5 Fuzzing

  1. LLMIF: Augmented Large Language Model for Fuzzing IoT Devices. Oakland 2024. [pdf]
  2. Large Language Model guided Protocol Fuzzing. NDSS 2024. [pdf]
  3. KernelGPT: Enhanced Kernel Fuzzing via Large Language Models. arxiv. [pdf]

2.6 Cyber Attack

  1. From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models. Oakland 2024. [pdf]

Contributing

This list is mainly maintained by Yifan Xia from NESA Lab.

We sincerely welcome contributors for contributing to this repository!

Markdown format

**Paper Name**. Conference Year. `Keywords` [[pdf](pdf_link)] [[code](code_link)]

Licenses

CC0

To the extent possible under law, Anderson-Xia all copyright and related or neighboring rights to this repository.