
https://github.com/SySeVR/SySeVR 项目的所需依赖和部分修改后的代码

Primary LanguagePython



Ubuntu 18.04 (Tested)

Step 1

  • Joern 0.3.1
    • JDK 1.7
    • Neo4J 2.1.8 Community Edition
    • Gremlin for Neo4J 2.X
    • Apache Ant build tool
  • Python 2.7

Step 2

  • Python 3.6
  • Tensorflow 1.6
  • Gensim 3.4


Joern 0.3.1

  • JDK 1.7

    1. extract the tarball

      tar -xvf jdk-7u80-linux-x64.tar.gz -C /usr/loacl/java
    2. set environment variable


      # Add These Content at the End of the File
      export JAVA_HOME
      export JRE_HOME
      export PATH
      update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jdk1.7.0_80/bin/java" 1
      update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_80/bin/javac" 1
      update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jdk1.7.0_80/bin/javaws" 1
      update-alternatives --set java /usr/local/java/jdk1.7.0_80/bin/java
      update-alternatives --set javac /usr/local/java/jdk1.7.0_80/bin/javac
      update-alternatives --set javaws /usr/local/java/jdk1.7.0_80/bin/javaws
      source /etc/profile
    3. verify

      java -version

      You should receive a message which displays

      java version "1.7.0_80"
      Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
  • Neo4j 2.1.8 Community Edition

    1. extract the tarball

      unzip neo4j-community-2.1.8.zip
      mkdir -d /usr/local/neo4j

    mv /usr/local/neo4j ./Neo4j/neo4j-community-2.1.8/*

    2. modify configure files
    > **configure files are located in /usr/local/neo4j/conf**
    # location of the database directory
    # Let the webserver only listen on the specified IP. Default is localhost (only
    # accept local connections). Uncomment to allow any connection. Please see the
    # security section in the neo4j manual before modifying this.


    # Java Heap Size: by default the Java heap size is dynamically
    # calculated based on available system resources.
    # Uncomment these lines to set specific initial and maximum
    # heap size in MB.
    wrapper.java.maxmemory=10240 #as large as you can
    1. set environment variable


      # Add These Content at the End of the File
      export NEO4J_HOME
      export PATH
      source /etc/profile
    2. start && verify

      /usr/local/neo4j/bin/neo4j console
  • Gremlin for Neo4J 2.X


    unzip neo4j-gremlin-plugin-tp2-2.1.5-server-plugin.zip -d $NEO4J_HOME/plugins/gremlin-plugin
  • Apache Ant build tool

    1. extract the tarball

      mkdir /usr/local/ant
      unzip -d /usr/local/ant apache-ant-1.8.4-bin.zip
    2. set environment variable


      # Add These Content at the End of the File
      export ANT_HOME
      export PATH
      source /etc/profile
    3. verify

      ant -version
  • Joern 0.3.1


    1. extract the tarball

    tar -xvf 0.3.1.tar.gz

  1. extract the tarball of build dependencies

      cd joern-0.3.1
      tar -xvf lib.tar.gz
  2. build the project

      cd joern-0.3.1

    The executable JAR file is located in joern-0.3.1/bin/joern.jar

  3. set environment variable (optional)


      # Add These Content at the End of the File
      export JOERN_HOME


      # Add These Content at the End of the File
      alias joern='java -jar $JOERN/bin/joern.jar'
      source /etc/profile
      source ~/.bashrc
  4. build additional tools (optional)

      cd joern-0.3.1
      ant tools
  5. install python-joern

      apt install python-setuptools python-dev python-pip
      pip2 install py2neo==2.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
      pip2 install py2neo-gremlin -i https://pypi.tuna.tsinghua.edu.cn/simple
      tar -xvf python-joern-0.3.1.tar.gz
      cd python-joern-0.3.1
      python2 setup.py install
  6. install joern-tools

      pip2 install chardet -i https://pypi.tuna.tsinghua.edu.cn/simple
      pip2 install pygraphviz -i https://pypi.tuna.tsinghua.edu.cn/simple
      git clone https://github.com/fabsx00/joern-tools
      cd joern-tools
      python2 setup.py install
  7. verify



Step1 Generating Slices

Work Dir: /home/SySeVR-master/Implementation/source2slice

Code Dir: /home/code

Recommended Memory Size: >=16GB (according to your code size)

If you want to slice the NVD and SARD, you may divide them into parts.

  1. install dependences

    apt install python-igraph
  2. parse the source code

    input: source codes

    output: .joernIndex

    rm -rf ./.joernIndex
    java -Xmx16g -jar $JOERN_HOME/bin/joern.jar /home/code

    This will create a neo4j database directory .joernIndex in this directory.

  3. generate CFG

    input: .joernIndex

    output: cfg_db

    # start neo4j at other terminal
    /usr/local/neo4j/bin/neo4j console
    mkdir cfg_db
    python2 get_cfg_relation.py
  4. generate PDG

    input: cfg_db .joernIndex

    output: pdg_db

    # modify access_db_operate.py
    http.socket_timeout = 999999999 # a big number
    mkdir pdg_db
    python2 complete_PDG.py
  5. generate call graph of functions

    input: pdg_db .joernIndex

    output: dict_call2cfgNodeID_funcID

    mkdir dict_call2cfgNodeID_funcID
    python2 access_db_operate.py
  6. generate four kinds of SyCVs

    input: dict_call2cfgNodeID_funcID

    outout: arrayuse_slice_points.pkl, integeroverflow_slice_points_new.pkl, pointuse_slice_points.pkl, sensifun_slice_points.pkl

    # modify points_get.py near the 128 rows
    # change "location" to ",location"
    for i in list_usenodes:
                    if str(i).find(",location")==-1:
                for i in list_usenodes:
                    if ',location' in str(i):
    python2 points_get.py
  7. extract slices

    input: dict_call2cfgNodeID_funcID, arrayuse_slice_points.pkl, integeroverflow_slice_points_new.pkl, pointuse_slice_points.pkl, sensifun_slice_points.pkl

    output: api_slices.txt, arrayuse_slice.txt, integeroverflow_slices.txt, pointeruse_slice.txt

    # modify save-file-path in extract_df.py
    store_filepath = "integeroverflow_slices.txt"
    store_filepath = "arraysuse_slices.txt"
    store_filepath = "pointersuse_slices.txt"
    store_filepath = "api_slices.txt"
    # add slice_op.py at 348 rows
    if not os.path.exists(path):
    	i += 1

    Due to the limit of memory, you may execute four functions in extract_df.py one by one.

    python2 extract_df.py
  8. get labels of slices

    input: api_slices.txt, arrayuse_slice.txt, integeroverflow_slices.txt, pointeruse_slice.txt

    output: apt_slices_label.pkl, api_slices-vulline.pkl, array_slice_label.pkl, expr_slice_label.pkl, pointer_slice_label.pkl

    # modify make_label.py at 70 rows
    # wrong format
    _dict_cwe2line = {}
    for _dict in dict:
    	for key in _dict.keys():
    		if _dict[key] not in _dict_cwe2line_target.keys():
    python2 make_label.py
  9. combine labels with slices

    input: apt_slices_label.pkl, api_slices-vulline.pkl, array_slice_label.pkl, expr_slice_label.pkl, pointer_slice_label.pkl, api_slices.txt, arrayuse_slice.txt, integeroverflow_slices.txt, pointeruse_slice.txt

    output: api_slices, arrayuse_slices.txt, integeroverflow_slices.txt, pointersuse_slices.txt

    mkdir slices label_source slice_label
    cp api_slices.txt arrayuse_slice.txt integeroverflow_slices.txt pointeruse_slice.txt ./slices
    cp apt_slices_label.pkl api_slices-vulline.pkl array_slice_label.pkl expr_slice_label.pkl pointer_slice_label.pkl ./label_source
    cd label_source
    mv expr_slice_label.pkl integeroverflow_slices.pkl
    mv apt_slices_label.pkl api_slices.pkl
    mv array_slice_label.pkl arrayuse_slice.pkl
    mv pointer_slice_label.pkl pointeruse_slice.pkl
    python2 data_preprocess.py

Step2 Data Perprocess


Step3 Deep Learning
