/EGFL

EGFL utilize the global node information of the bytecode graph to enhance smart contract vulnerability detection

Primary LanguagePython

EGFL

This repo is a paper on Python implementation: A vulnerability detection framework with enhanced graph feature learning. This paper designs a new deep learning-based framework, named EGFL, that aims to utilize the enhanced graph learning technique to improve the performance of detecting software vulnerabilities (i.e., smart contract vulnerabilities).

Datasets

We use a recently-released and large-scale dataset Qian et al., 2023 as our benchmark, which mainly covers six categories of vulnerabilities: reentrancy (RE), timestamp dependence (TD), integer overflow/underflow (OF), delegatecall (DE), block number dependency (BN) and unchecked external call (UC). The benchmark dataset comprises 42,910 real-world smart contracts collected from the Ethereum platform Thomas et al., 2020, and was created using rigorous data collection and labeling strategies.

Further instructions on the dataset can be found on Smart-Contract-Dataset, which is constantly being updated to provide more details.

Required Packages

  • Python 3.7+
  • Keras 2.3.1
  • numpy 1.19.5
  • scikit-learn 1.3.0
  • tensorflow 1.15.0

The construction tools

We use a public tool BinaryCFGExtractor to compile a smart contract bytecode into the opcode and corresponding control flow graph (CFG). This compilation tool is mentioned in the paper Cross-modality mutual learning for enhancing smart contract vulnerability detection on bytecode and we provide the source code for this tool.

To construct a CFG of bytecode, you also can use the public tool evm_cfg_builder.

We generally use various graph neural networks to handle the CFG and learn the graph features. To learn the graph features of the bytecode CFG, we primarily adopt the GCN model, and refer to some related Github works, such as GraphExtractor and AME.

The compilation tools

If you do not collect enough Ethereum bytecode as your training dataset, you can use the Solidity compiler to compile the Solidity source code into Ethereum bytecode. Besides, you can also use the Bytecode to Opcode Disassembler to convert the bytecode into the opcode.

As a supplement, you can employ an online Solidity Compiler remix to compile the Solidity source code into Ethereum bytecode.

Reference

This work has been accepted by the Journal of Systems and Software (JSS). You can cite this paper by:

@article{cheng2024vulnerability,
      title={A vulnerability detection framework with enhanced graph feature learning},
      author={Cheng, Jianxin and Chen, Yizhou and Cao, Yongzhi and Wang, Hanpin},
      journal={Journal of Systems and Software},
      pages={112118},
      year={2024},
      publisher={Elsevier}
}