This toolkit is for the paper "How About Bug-Triggering Paths? - Understanding and Characterizing Learning-Based Vulnerability Detectors"
This toolkit contains implementations of all learning-based vulnerability detection methods used in the paper
For popular vulnerability detection methods, we build a unified framework. You can adapt to different datasets and train different models only by modifying the parameter settings.
methods | paper |
---|---|
code2seq | code2seq: Generating sequences from structured representations of code |
code2vec | code2vec: Learning distributed representations of code |
DeepWukong | DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network |
μVulDeePecker | μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection |
VulDeePecker | The Network and Distributed System Security Symposium |
SySeVr | SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities |
ReVeal | Deep Learning based Vulnerability Detection: Are We There Yet? |
Vgdetector | Static Detection of Control-Flow-Related Vulnerabilities Using Graph Embedding |
token embedding | Automated vulnerability detection in source code using deep representation |
IVDetect | Vulnerability Detection with Fine-Grained Interpretations |
VulDeeLocator | VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector |
ICVH | Information-theoretic Source Code Vulnerability Highlighting |
VELVET | VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements |
The framework contains four directories: config/ , models/ , preprocessing/ , utils/
config: All parameter settings.
models: Models are all implemented using pytorch lightning.
preprocessing: Data Preprocessing for Partial Methods
.
utils: Commonly used tool functions
train.py: Entry point for all model training steps
Part of the work uses Joern for program analysis and extracts program slices
We implemented Joern-based slicing methods for Deepwukong, Vuldeepecker, SySevr. But using an old version of Joern which can generate PDG's node.csv and edge.csv.
joern_slicer: slicing methods. If you want to use this part of the code, you need a Joern version that can generate a csv file of PDG
.
you can find it here old Joern