/AMPLE

Primary LanguagePython

AMPLE - Implementation

Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning

Introduction

Prior studies have demonstrated the effectiveness of Deep Learning (DL) in automated software vulnerability detection. Graph Neural Networks (GNNs) have proven effective in learning the graph representations of source code and are commonly adopted by existing DL-based vulnerability detection methods. However, the existing methods are still limited by the fact that GNNs are essentially difficult to handle the connections between long-distance nodes in a code structure graph. Besides, they do not well exploit the multiple types of edges in a code structure graph (such as edges representing data flow and control flow). Consequently, despite achieving state-of-the-art performance, the existing GNN-based methods tend to fail to capture global information (ie, long-range dependencies among nodes) of code graphs.

To mitigate these issues, in this paper, we propose a novel vulnerability detection framework with grAph siMplification and enhanced graph rePresentation LEarning, named AMPLE. AMPLE mainly contains two parts: 1) graph simplification, which aims at reducing the distances between nodes by shrinking the node sizes of code structure graphs; 2) enhanced graph representation learning, which involves one edge-aware graph convolutional network module for fusing heterogeneous edge information into node representations and one kernel-scaled representation moule for well capturing the relations between distant graph nodes. Experiments on three public benchmark datasets show that AMPLE outperforms the state-of-the-art methods by 0.39%-35.32% and 7.64%-199.81% with respect to the accuracy and F1 score metrics, respectively. The results demonstrate the effectiveness of AMPLE in learning global information of code graphs for vulnerability detection.

Dataset

To investigate the effectiveness of AMPLE, we adopt three vulnerability datasets from these paper:

Requirement

Our code is based on Python3 (>= 3.7). There are a few dependencies to run the code. The major libraries are listed as follows:

  • torch (==1.9.0)
  • dgl (==0.7.2)
  • numpy (==1.22.3)
  • sklearn (==0.0)
  • pandas (==1.4.1)
  • tqdm

Default settings in AMPLE:

  • Training configs:
    • batch_size = 64, lr = 0.0001, epoch = 100, patience = 20
    • opt ='RAdam', weight_decay=1e-6

Preprocessing

We use Joern to generate the code structure graph and we provide a compiled version of joern here. It should be noted that the AST and graphs generated by different versions of Joern may have significant differences. So if using the newer versions of Joern to generate code structure graph, the model may have a different performance compared with the results we reported in the paper.

After parsing the functions with joern, the code for graph construction and simplification is under the data_processing\ folder. data_processing\word2vec.py is used to train word2vec model. We also provide our trained word2vec model here.

Running the model

The model implementation code is under the AMPLE_code\ folder. The model can be runned from AMPLE_code\main.py.

Attention weight

We provide all the attention weights learned by our proposed model AMPLE for the test samples. Each dataset corresponds to a json file under attention weight\ folder.

Experiment results

PR-AUC & MCC && G-measure && T-test

Table 1. Experiment results for Reveal and AMPLE. "*" denotes sttistical significance in comparision to Reveal in terms of accuracy and F1 score (i.e., two-sided t-test with p-value < 0.05).
Dataset Metrics Reveal AMPLE
FFmpeg+Qemu Accuracy 0.6107 0.6216
Precision 0.5550 0.5564
Recall 0.7070 0.8399
F1 score 0.6219 0.6694 *
PR-AUC 0.6972 0.7347
MCC 0.2398 0.2995
G-measure 0.6264 0.6836
Reveal Accuracy 0.8177 0.9271 *
Precision 0.3155 0.5106
Recall 0.6114 0.4615
F1 score 0.4162 0.4848 *
PR-AUC 0.4841 0.5061
MCC 0.3457 0.4464
G-measure 0.4392 0.4854
Fan et al. Accuracy 0.8714 0.9314 *
Precision 0.1722 0.2998
Recall 0.3404 0.3458
F1 score 0.2287 0.3211 *
PR-AUC 0.2748 0.3383
MCC 0.1783 0.2860
G-measure 0.2421 0.322
Table 2. The p-value of two-sided t-test results between AMPLE and Reveal in terms of accuracy and F1 score.
FFmpeg+Qemu Reveal Fan et al.
Accuracy 0.41 4.67e-12 1.00e-4
F1 score 1.00e-3 4.55e-6 1.41e-9

References

[1] Jiahao Fan, Yi Li, Shaohua Wang, and Tien Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In The 2020 International Conference on Mining Software Repositories (MSR). IEEE.

[2] Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet? arXiv preprint arXiv:2009.07235 (2020).

[3] Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197–10207.

[4] M. Fu and C. Tantithamthavorn. 2022. Linevul: A transformer-based line-level vulnerability prediction. In The 2022 International Conference on Mining Software Repositories (MSR). IEEE.