When LLMs Meet Cybersecurity: A Systematic Literature Review

🔥 Updates

📆[2024-06-14] We have updated the related papers up to May 31st, with 37 new papers added (2024.03.20-2024.05.31).


🌈 Introduction

We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.

We seek to address three key questions:

  • RQ1: How to construct cyber security-oriented domain LLMs?
  • RQ2: What are the potential applications of LLMs in cybersecurity?
  • RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?

table_1

🚩 Features

(2023.03.20) Our study encompasses an analysis of over 180 works, spanning across 25 LLMs and more than 10 downstream scenarios.

statistic

🌟 Literatures

RQ1: How to construct cybersecurity-oriented domain LLMs?

Cybersecurity Evaluation Benchmarks

  1. CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity [paper] 2024.02.12

  2. SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models [paper] 2023

  3. SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security [paper] 2023.12.26

  4. Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. [paper] 2022.11.09

  5. Can llms patch security issues? [paper] 2024.02.19

  6. DebugBench: Evaluating Debugging Capability of Large Language Models [paper] 2024.01.11

  7. An empirical study of netops capability of pre-trained large language models. [paper] 2023.09.19

  8. OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models [paper] 2024.02.16

  9. Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models [paper] 2023.12.07

  10. LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations [paper] 2023.03.16

  11. Can LLMs Understand Computer Networks? Towards a Virtual System Administrator [paper] 2024.04.22

  12. Assessing Cybersecurity Vulnerabilities in Code Large Language Models [paper] 2024.04.29

  13. SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory [paper] 2024.05.30

Fine-tuned Domain LLMs for Cybersecurity

  1. Finetuning Large Language Models for Vulnerability Detection [paper] 2024.02.29

  2. SecureFalcon: The Next Cyber Reasoning System for Cyber Security [paper] 2023.07.13

  3. Large Language Models for Test-Free Fault Localization [paper] 2023.10.03

  4. RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair [paper] 2024.03.11

  5. Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding [paper] 2023.10.06

  6. Instruction Tuning for Secure Code Generation [paper] 2024.02.14

  7. Nova+: Generative Language Models for Binaries [paper] 2023.11.27

  8. Owl: A Large Language Model for IT Operations [paper] 2023.09.17

  9. HackMentor: Fine-tuning Large Language Models for Cybersecurity [paper] 2023.09

  10. Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns [paper] 2024.04.30

RQ2: What are the potential applications of LLMs in cybersecurity?

Threat Intelligence

  1. LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge [paper] 2024.01.18

  2. AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation [paper] 2023.10.04

  3. On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions [paper] 2023.08.22

  4. Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation [paper] 2024.01.12

  5. An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures [paper] 2023.08.09

  6. ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models [paper] 2023.12.22

  7. Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild [paper] 2023.07.14

  8. Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection [paper] 2023.08.27

  9. HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion [paper] 2023.12.21

  10. Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 [paper] 2023.09.28

  11. Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness [paper] 2024.03.13

  12. Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models [paper] 2024.03.01

  13. SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence [paper] 2024.05.06

  14. AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models [paper] 2024.05.08

FUZZ

  1. Augmenting Greybox Fuzzing with Generative AI [paper] 2023.06.11

  2. How well does LLM generate security tests? [paper] 2023.10.03

  3. Fuzz4All: Universal Fuzzing with Large Language Models [paper] 2024.01.15

  4. CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models [paper] 2023.07.26

  5. Understanding Large Language Model Based Fuzz Driver Generation [paper] 2023.07.24

  6. Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models [paper] 2023.06.07

  7. Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT [paper] 2023.04.04

  8. Large language model guided protocol fuzzing [paper] 2024.02.26

  9. Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing [paper] 2024.03.06

  10. When Fuzzing Meets LLMs: Challenges and Opportunities [paper] 2024.04.25

Vulnerabilities Detection

  1. Evaluation of ChatGPT Model for Vulnerability Detection [paper] 2023.04.12

  2. Detecting software vulnerabilities using Language Models [paper] 2023.02.23

  3. Software Vulnerability Detection using Large Language Models [paper] 2023.09.02

  4. Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [paper] 2023.11.16

  5. Software Vulnerability and Functionality Assessment using LLMs [paper] 2024.03.13

  6. Finetuning Large Language Models for Vulnerability Detection [paper] 2024.03.01

  7. The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models [paper] 2023.11.15

  8. DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism [paper] 2023.09.27

  9. Prompt-Enhanced Software Vulnerability Detection Using ChatGPT [paper] 2023.08.24

  10. Using ChatGPT as a Static Application Security Testing Tool [paper] 2023.08.28

  11. LLbezpeky: Leveraging Large Language Models for Vulnerability Detection [paper] 2024.01.13

  12. Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives [paper] 2023.10.16

  13. Software Vulnerability Detection with GPT and In-Context Learning [paper] 2024.01.08

  14. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis [paper] 2023.12.25

  15. VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model [paper] 2023.08.09

  16. LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning [paper] 2024.01.29

  17. Large Language Models for Test-Free Fault Localization [paper] 2023.10.03

  18. Multi-role Consensus through LLMs Discussions for Vulnerability Detection [paper] 2024.03.21

  19. How ChatGPT is Solving Vulnerability Management Problem [paper] 2023.11.11

  20. DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection [paper] 2023.08.09

  21. The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification [paper] 2023.09.02

  22. How Far Have We Gone in Vulnerability Detection Using Large Language Models [paper] 2023.12.22

  23. Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap [paper] 2024.04.04

  24. DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection [paper] 2024.05.02

  25. Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [paper] 2024.05.24

  26. LLM-Assisted Static Analysis for Detecting Security Vulnerabilities [paper] 2024.05.27

Insecure code Generation

  1. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants [paper] 2023.02.27

  2. Bugs in Large Language Models Generated Code [paper] 2024.03.18

  3. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions [paper] 2021.12.16

  4. The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis [paper] 2023.08.29

  5. No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT [paper] 2023.08.09

  6. Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code [paper] 2023.11.01

  7. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation [paper] 2023.10.30

  8. Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet [paper] 2023.12.19

  9. A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages [paper] 2023.08.08

  10. How Secure is Code Generated by ChatGPT? [[paper]](How Secure is Code Generated by ChatGPT?) 2023.04.19

  11. Large Language Models for Code: Security Hardening and Adversarial Testing [paper] 2023.09.29

  12. Pop Quiz! Can a Large Language Model Help With Reverse Engineering? [paper] 2022.02.02

  13. LLM4Decompile: Decompiling Binary Code with Large Language Models [paper] 2024.03.08

  14. Large Language Models for Code Analysis: Do LLMs Really Do Their Job? [paper] 2024.03.05

  15. Understanding Programs by Exploiting (Fuzzing) Test Cases [paper] 2023.01.12

  16. Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [paper] 2023.08.07

  17. Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [paper] 2023.12.13

  18. Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats [paper] 2023.11.21

  19. Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models [paper] 2024.03.18

  20. DebugBench: Evaluating Debugging Capability of Large Language Models [paper] 2024.01.11

  21. Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions [paper] 2023.10.24

  22. FLAG: Finding Line Anomalies (in code) with Generative AI [paper] 2023.07.22

  23. Evolutionary Large Language Models for Hardware Security: A Comparative Survey [paper] 2024.04.25

  24. Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models [paper] 2024.04.29

  25. LLM Security Guard for Code [paper] 2024.05.03

  26. Code Repair with LLMs gives an Exploration-Exploitation Tradeoff [paper] 2024.05.30

Program Repair

  1. Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs [paper] 2023.11.06

  2. An Analysis of the Automatic Bug Fixing Performance of ChatGPT [paper] 2023.01.20

  3. AI-powered patching: the future of automated vulnerability fixes [paper] 2024.01.31

  4. Practical Program Repair in the Era of Large Pre-trained Language Models [paper] 2022.10.25

  5. Security Code Review by LLMs: A Deep Dive into Responses [paper] 2024.01.29

  6. Examining Zero-Shot Vulnerability Repair with Large Language Models [paper] 2022.08.15

  7. How Effective Are Neural Networks for Fixing Security Vulnerabilities [paper] 2023.05.29

  8. Can LLMs Patch Security Issues? [paper] 2024.02.19

  9. InferFix: End-to-End Program Repair with LLMs [paper] 2023.03.13

  10. ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching [paper] 2023.08.24

  11. DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection [paper] 2023.08.14

  12. Fixing Hardware Security Bugs with Large Language Models [paper] 2023.02.02

  13. A Study of Vulnerability Repair in JavaScript Programs with Large Language Models [paper] 2023.03.19

  14. Enhanced Automated Code Vulnerability Repair using Large Language Models [paper] 2024.01.08

  15. Teaching Large Language Models to Self-Debug [paper] 2023.10.05

  16. Better Patching Using LLM Prompting, via Self-Consistency [paper] 2023.08.16

  17. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair [paper] 2023.11.08

  18. LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [paper] 2024.02.22

  19. ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs [paper] 2024.03.07

  20. When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? [paper] 2023.03.01

  21. Aligning LLMs for FL-free Program Repair [paper] 2024.04.13

  22. Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs [paper] 2024.04.22

  23. How Far Can We Go with Practical Function-Level Program Repair? [paper] 2024.04.19

  24. Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [paper] 2024.03.23

  25. A Systematic Literature Review on Large Language Models for Automated Program Repair [paper] 2024.05.12

  26. Automated Repair of AI Code with Large Language Models and Formal Verification [paper] 2024.05.14

  27. A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback [paper] 2024.05.24

Anomaly Detection

  1. Benchmarking Large Language Models for Log Analysis, Security, and Interpretation [paper] 2023.11.24

  2. Log-based Anomaly Detection based on EVT Theory with feedback [paper] 2023.09.30

  3. LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection [paper] 2023.09.14

  4. LogGPT: Log Anomaly Detection via GPT [paper] 2023.12.11

  5. Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies [paper] 2024.01.26

  6. Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging [paper] 2024.03.02

  7. Web Content Filtering through knowledge distillation of Large Language Models [paper] 2023.05.10

  8. Application of Large Language Models to DDoS Attack Detection [paper] 2024.02.05

  9. An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach [paper] 2023.11.12

  10. Evaluating the Performance of ChatGPT for Spam Email Detection [paper] 2024.02.23

  11. Prompted Contextual Vectors for Spear-Phishing Detection [paper] 2024.02.14

  12. Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models [paper] 2023.11.30

  13. Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection [paper] 2023.10.30

  14. Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices [paper] 2024.02.08

  15. HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) [paper] 2023.09.27

  16. ChatGPT for digital forensic investigation: The good, the bad, and the unknown [paper] 2023.07.10

  17. Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance [paper] 2024.04.23

  18. LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing [paper] 2024.04.27

  19. DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS [paper] 2024.05.12

  20. Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection [paper] 2024.05.17

LLM Assisted Attack

  1. Identifying and mitigating the security risks of generative ai [paper] 2023.12.29

  2. Impact of Big Data Analytics and ChatGPT on Cybersecurity [paper] 2023.05.22

  3. From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy [paper] 2023.07.03

  4. LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing [paper] 2023.10.10

  5. Malla: Demystifying Real-world Large Language Model Integrated Malicious Services [paper] 2024.01.06

  6. Evaluating LLMs for Privilege-Escalation Scenarios [paper] 2023.10.23

  7. Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions [paper] 2023.08.21

  8. Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT [paper] 2023.09.19

  9. From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude [paper] 2024.03.10

  10. From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads [paper] 2023.05.24

  11. PentestGPT: An LLM-empowered Automatic Penetration Testing Tool [paper] 2023.08.13

  12. AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks [paper] 2024.03.02

  13. RatGPT: Turning online LLMs into Proxies for Malware Attacks [paper] 2023.09.07

  14. Getting pwn’d by AI: Penetration Testing with Large Language Models [paper] 2023.08.17

Others

  1. An LLM-based Framework for Fingerprinting Internet-connected Devices [paper] 2023.10.24

  2. Anatomy of an AI-powered malicious social botnet [paper] 2023.07.30

  3. Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation [paper] 2023.12.12

  4. LLM for SoC Security: A Paradigm Shift [paper] 2023.10.09

  5. Harnessing the Power of LLM to Support Binary Taint Analysis [paper] 2023.10.12

  6. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations [paper] 2023.12.07

  7. LLM in the Shell: Generative Honeypots [paper] 2024.02.09

  8. Employing LLMs for Incident Response Planning and Review [paper] 2024.03.02

  9. Enhancing Network Management Using Code Generated by Large Language Models [[paper]] (https://arxiv.org/abs/2308.06261) 2023.08.11

  10. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models [paper] 2023.07.18

  11. Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions [paper] 2024.02.07

  12. How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models [paper] 2024.04.16

  13. Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models [paper] 2024.04.24

  14. AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering [paper] 2024.04.29

  15. Large Language Models for Cyber Security: A Systematic Literature Review [paper] 2024.05.08

  16. Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities [paper] 2024.05.08

  17. LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots [paper] 2024.05.10

  18. A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions [paper] 2024.05.23

RQ3: What are further research directions about the application of LLMs in cybersecurity?

Further Research: Agent4Cybersecurity

  1. Cybersecurity Issues and Challenges [paper] 2022.08

  2. A unified cybersecurity framework for complex environments [paper] 2018.09.26

  3. LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution [paper] 2024.02.20

  4. Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments [paper] 2023.08.28

  5. Llm agents can autonomously hack websites. [paper] 2024.02.16

  6. Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [paper] 2024.02.27

  7. TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage [paper] 2023.11.07

  8. The Rise and Potential of Large Language Model Based Agents: A Survey [paper] 2023.09.19

  9. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [paper] 2023.10.03

  10. From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs [paper] 2024.02.28

  11. If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. [paper] 2024.01.08

  12. TaskWeaver: A Code-First Agent Framework [paper] 2023.12.01

  13. Large Language Models for Networking: Applications, Enabling Techniques, and Challenges [paper] 2023.11.29

  14. R-Judge: Benchmarking Safety Risk Awareness for LLM Agents [paper] 2024.02.18

  15. WIPI: A New Web Threat for LLM-Driven Web Agents [paper] 2024.02.26

  16. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents [paper] 2024.03.25

  17. LLM Agents can Autonomously Exploit One-day Vulnerabilities [paper] 2024.04.17

  18. Large Language Models for Networking: Workflow, Advances and Challenges [paper] 2024.04.29

  19. Generative AI in Cybersecurity [paper] 2024.05.02

  20. Generative AI and Large Language Models for Cyber Security: All Insights You Need [paper] 2024.05.21

📖BibTeX

@misc{zhang2024llms,
      title={When LLMs Meet Cybersecurity: A Systematic Literature Review}, 
      author={Jie Zhang and Haoyu Bu and Hui Wen and Yu Chen and Lun Li and Hongsong Zhu},
      year={2024},
      eprint={2405.03644},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}