Awesome-LLM4Cybersecurity: A repository from tuuzi

When LLMs Meet Cybersecurity: A Systematic Literature Review

🔥 Updates

📆[2024-06-14] We have updated the related papers up to May 31st, with 37 new papers added (2024.03.20-2024.05.31).

When LLMs Meet Cybersecurity: A Systematic Literature Review
🔥 Updates
🌈 Introduction
🚩 Features
🌟 Literatures
📖BibTeX

🌈 Introduction

We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.

We seek to address three key questions:

RQ1: How to construct cyber security-oriented domain LLMs?
RQ2: What are the potential applications of LLMs in cybersecurity?
RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?

🚩 Features

(2023.03.20) Our study encompasses an analysis of over 180 works, spanning across 25 LLMs and more than 10 downstream scenarios.

🌟 Literatures

RQ1: How to construct cybersecurity-oriented domain LLMs?

Cybersecurity Evaluation Benchmarks

CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity [paper] 2024.02.12
SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models [paper] 2023
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security [paper] 2023.12.26
Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. [paper] 2022.11.09
Can llms patch security issues? [paper] 2024.02.19
DebugBench: Evaluating Debugging Capability of Large Language Models [paper] 2024.01.11
An empirical study of netops capability of pre-trained large language models. [paper] 2023.09.19
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models [paper] 2024.02.16
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models [paper] 2023.12.07
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations [paper] 2023.03.16
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator [paper] 2024.04.22
Assessing Cybersecurity Vulnerabilities in Code Large Language Models [paper] 2024.04.29
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory [paper] 2024.05.30

Fine-tuned Domain LLMs for Cybersecurity

Finetuning Large Language Models for Vulnerability Detection [paper] 2024.02.29
SecureFalcon: The Next Cyber Reasoning System for Cyber Security [paper] 2023.07.13
Large Language Models for Test-Free Fault Localization [paper] 2023.10.03
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair [paper] 2024.03.11
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding [paper] 2023.10.06
Instruction Tuning for Secure Code Generation [paper] 2024.02.14
Nova+: Generative Language Models for Binaries [paper] 2023.11.27
Owl: A Large Language Model for IT Operations [paper] 2023.09.17
HackMentor: Fine-tuning Large Language Models for Cybersecurity [paper] 2023.09
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns [paper] 2024.04.30

RQ2: What are the potential applications of LLMs in cybersecurity?

Threat Intelligence

LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge [paper] 2024.01.18
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation [paper] 2023.10.04
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions [paper] 2023.08.22
Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation [paper] 2024.01.12
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures [paper] 2023.08.09
ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models [paper] 2023.12.22
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild [paper] 2023.07.14
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection [paper] 2023.08.27
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion [paper] 2023.12.21
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 [paper] 2023.09.28
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness [paper] 2024.03.13
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models [paper] 2024.03.01
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence [paper] 2024.05.06
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models [paper] 2024.05.08

FUZZ

Augmenting Greybox Fuzzing with Generative AI [paper] 2023.06.11
How well does LLM generate security tests? [paper] 2023.10.03
Fuzz4All: Universal Fuzzing with Large Language Models [paper] 2024.01.15
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models [paper] 2023.07.26
Understanding Large Language Model Based Fuzz Driver Generation [paper] 2023.07.24
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models [paper] 2023.06.07
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT [paper] 2023.04.04
Large language model guided protocol fuzzing [paper] 2024.02.26
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing [paper] 2024.03.06
When Fuzzing Meets LLMs: Challenges and Opportunities [paper] 2024.04.25

Vulnerabilities Detection

Evaluation of ChatGPT Model for Vulnerability Detection [paper] 2023.04.12
Detecting software vulnerabilities using Language Models [paper] 2023.02.23
Software Vulnerability Detection using Large Language Models [paper] 2023.09.02
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [paper] 2023.11.16
Software Vulnerability and Functionality Assessment using LLMs [paper] 2024.03.13
Finetuning Large Language Models for Vulnerability Detection [paper] 2024.03.01
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models [paper] 2023.11.15
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism [paper] 2023.09.27
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT [paper] 2023.08.24
Using ChatGPT as a Static Application Security Testing Tool [paper] 2023.08.28
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection [paper] 2024.01.13
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives [paper] 2023.10.16
Software Vulnerability Detection with GPT and In-Context Learning [paper] 2024.01.08
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis [paper] 2023.12.25
VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model [paper] 2023.08.09
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning [paper] 2024.01.29
Large Language Models for Test-Free Fault Localization [paper] 2023.10.03
Multi-role Consensus through LLMs Discussions for Vulnerability Detection [paper] 2024.03.21
How ChatGPT is Solving Vulnerability Management Problem [paper] 2023.11.11
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection [paper] 2023.08.09
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification [paper] 2023.09.02
How Far Have We Gone in Vulnerability Detection Using Large Language Models [paper] 2023.12.22
Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap [paper] 2024.04.04
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection [paper] 2024.05.02
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [paper] 2024.05.24
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities [paper] 2024.05.27

Insecure code Generation

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants [paper] 2023.02.27
Bugs in Large Language Models Generated Code [paper] 2024.03.18
Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions [paper] 2021.12.16
The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis [paper] 2023.08.29
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT [paper] 2023.08.09
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code [paper] 2023.11.01
Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation [paper] 2023.10.30
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet [paper] 2023.12.19
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages [paper] 2023.08.08
How Secure is Code Generated by ChatGPT? [[paper]](How Secure is Code Generated by ChatGPT?) 2023.04.19
Large Language Models for Code: Security Hardening and Adversarial Testing [paper] 2023.09.29
Pop Quiz! Can a Large Language Model Help With Reverse Engineering? [paper] 2022.02.02
LLM4Decompile: Decompiling Binary Code with Large Language Models [paper] 2024.03.08
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? [paper] 2024.03.05
Understanding Programs by Exploiting (Fuzzing) Test Cases [paper] 2023.01.12
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [paper] 2023.08.07
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 [paper] 2023.12.13
Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats [paper] 2023.11.21
Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models [paper] 2024.03.18
DebugBench: Evaluating Debugging Capability of Large Language Models [paper] 2024.01.11
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions [paper] 2023.10.24
FLAG: Finding Line Anomalies (in code) with Generative AI [paper] 2023.07.22
Evolutionary Large Language Models for Hardware Security: A Comparative Survey [paper] 2024.04.25
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models [paper] 2024.04.29
LLM Security Guard for Code [paper] 2024.05.03
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff [paper] 2024.05.30

Program Repair

Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs [paper] 2023.11.06
An Analysis of the Automatic Bug Fixing Performance of ChatGPT [paper] 2023.01.20
AI-powered patching: the future of automated vulnerability fixes [paper] 2024.01.31
Practical Program Repair in the Era of Large Pre-trained Language Models [paper] 2022.10.25
Security Code Review by LLMs: A Deep Dive into Responses [paper] 2024.01.29
Examining Zero-Shot Vulnerability Repair with Large Language Models [paper] 2022.08.15
How Effective Are Neural Networks for Fixing Security Vulnerabilities [paper] 2023.05.29
Can LLMs Patch Security Issues? [paper] 2024.02.19
InferFix: End-to-End Program Repair with LLMs [paper] 2023.03.13
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching [paper] 2023.08.24
DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection [paper] 2023.08.14
Fixing Hardware Security Bugs with Large Language Models [paper] 2023.02.02
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models [paper] 2023.03.19
Enhanced Automated Code Vulnerability Repair using Large Language Models [paper] 2024.01.08
Teaching Large Language Models to Self-Debug [paper] 2023.10.05
Better Patching Using LLM Prompting, via Self-Consistency [paper] 2023.08.16
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair [paper] 2023.11.08
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [paper] 2024.02.22
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs [paper] 2024.03.07
When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? [paper] 2023.03.01
Aligning LLMs for FL-free Program Repair [paper] 2024.04.13
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs [paper] 2024.04.22
How Far Can We Go with Practical Function-Level Program Repair? [paper] 2024.04.19
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [paper] 2024.03.23
A Systematic Literature Review on Large Language Models for Automated Program Repair [paper] 2024.05.12
Automated Repair of AI Code with Large Language Models and Formal Verification [paper] 2024.05.14
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback [paper] 2024.05.24

Anomaly Detection

Benchmarking Large Language Models for Log Analysis, Security, and Interpretation [paper] 2023.11.24
Log-based Anomaly Detection based on EVT Theory with feedback [paper] 2023.09.30
LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection [paper] 2023.09.14
LogGPT: Log Anomaly Detection via GPT [paper] 2023.12.11
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies [paper] 2024.01.26
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging [paper] 2024.03.02
Web Content Filtering through knowledge distillation of Large Language Models [paper] 2023.05.10
Application of Large Language Models to DDoS Attack Detection [paper] 2024.02.05
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach [paper] 2023.11.12
Evaluating the Performance of ChatGPT for Spam Email Detection [paper] 2024.02.23
Prompted Contextual Vectors for Spear-Phishing Detection [paper] 2024.02.14
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models [paper] 2023.11.30
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection [paper] 2023.10.30
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices [paper] 2024.02.08
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) [paper] 2023.09.27
ChatGPT for digital forensic investigation: The good, the bad, and the unknown [paper] 2023.07.10
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance [paper] 2024.04.23
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing [paper] 2024.04.27
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS [paper] 2024.05.12
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection [paper] 2024.05.17

LLM Assisted Attack

Identifying and mitigating the security risks of generative ai [paper] 2023.12.29
Impact of Big Data Analytics and ChatGPT on Cybersecurity [paper] 2023.05.22
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy [paper] 2023.07.03
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing [paper] 2023.10.10
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services [paper] 2024.01.06
Evaluating LLMs for Privilege-Escalation Scenarios [paper] 2023.10.23
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions [paper] 2023.08.21
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT [paper] 2023.09.19
From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude [paper] 2024.03.10
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads [paper] 2023.05.24
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool [paper] 2023.08.13
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks [paper] 2024.03.02
RatGPT: Turning online LLMs into Proxies for Malware Attacks [paper] 2023.09.07
Getting pwn’d by AI: Penetration Testing with Large Language Models [paper] 2023.08.17

Others

An LLM-based Framework for Fingerprinting Internet-connected Devices [paper] 2023.10.24
Anatomy of an AI-powered malicious social botnet [paper] 2023.07.30
Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation [paper] 2023.12.12
LLM for SoC Security: A Paradigm Shift [paper] 2023.10.09
Harnessing the Power of LLM to Support Binary Taint Analysis [paper] 2023.10.12
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations [paper] 2023.12.07
LLM in the Shell: Generative Honeypots [paper] 2024.02.09
Employing LLMs for Incident Response Planning and Review [paper] 2024.03.02
Enhancing Network Management Using Code Generated by Large Language Models [[paper]] (https://arxiv.org/abs/2308.06261) 2023.08.11
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models [paper] 2023.07.18
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions [paper] 2024.02.07
How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models [paper] 2024.04.16
Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models [paper] 2024.04.24
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering [paper] 2024.04.29
Large Language Models for Cyber Security: A Systematic Literature Review [paper] 2024.05.08
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities [paper] 2024.05.08
LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots [paper] 2024.05.10
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions [paper] 2024.05.23

RQ3: What are further research directions about the application of LLMs in cybersecurity?

Further Research: Agent4Cybersecurity

Cybersecurity Issues and Challenges [paper] 2022.08
A unified cybersecurity framework for complex environments [paper] 2018.09.26
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution [paper] 2024.02.20
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments [paper] 2023.08.28
Llm agents can autonomously hack websites. [paper] 2024.02.16
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [paper] 2024.02.27
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage [paper] 2023.11.07
The Rise and Potential of Large Language Model Based Agents: A Survey [paper] 2023.09.19
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [paper] 2023.10.03
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs [paper] 2024.02.28
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. [paper] 2024.01.08
TaskWeaver: A Code-First Agent Framework [paper] 2023.12.01
Large Language Models for Networking: Applications, Enabling Techniques, and Challenges [paper] 2023.11.29
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents [paper] 2024.02.18
WIPI: A New Web Threat for LLM-Driven Web Agents [paper] 2024.02.26
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents [paper] 2024.03.25
LLM Agents can Autonomously Exploit One-day Vulnerabilities [paper] 2024.04.17
Large Language Models for Networking: Workflow, Advances and Challenges [paper] 2024.04.29
Generative AI in Cybersecurity [paper] 2024.05.02
Generative AI and Large Language Models for Cyber Security: All Insights You Need [paper] 2024.05.21

📖BibTeX

@misc{zhang2024llms,
      title={When LLMs Meet Cybersecurity: A Systematic Literature Review}, 
      author={Jie Zhang and Haoyu Bu and Hui Wen and Yu Chen and Lun Li and Hongsong Zhu},
      year={2024},
      eprint={2405.03644},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

tuuzi/Awesome-LLM4Cybersecurity