/graph-ast

A tool to provide the graph representation of the source code based on the paper: "Learning to Represent Program with Graph"

Primary LanguagePython

Graph-AST: Graph Representation of the Abstract Syntax Tree

A tool to generate the graph representation of the source code based on the paper: Learning to Represent Program with Graph, ICLR 2018.

Note that this is only for the graph generation from the AST. For the Gated Graph Neural Network implementation that process the graph, please refer to GGNN for graph classification

Installation

The backbone of this tool is the Abstract Syntax Tree (AST). The AST will be generated using the f-ast tool: fAST: Flattening Abstract Syntax Trees for Efficiency, ICSE 2019. The tool supports any ANTLR4 grammar of over 170 different types of programming languages.

Some benefits of using the f-ast:

  • f-ast leverages protobuf to store the AST and make the parsing much faster than the other tools.
  • f-ast is built based on srcml and srcSlice. That is, it can incorporate the slicing information of the program, such as the use-def chain (taken from srcSlice) into the AST. The use-def chain is a critical information to generate the graph-ast.

A runnable docker image of the tool can be pulled by using this command:

  $ docker pull yijun/fast:latest

Example usages:

To generate an AST representation of a file.

  $ cd sample_files
  $ docker run -v $(pwd):/e -it yijun/fast -p Test.c Test.pb

The Test.pb file is the AST representation under the protobuf format. For example on how to read and traverse the tree, see this link.

Since the goal of this tool is to generate the graph representation of the source code, the next step is to run:

  $  python3 generate_graph Test.pb Test.txt

The Test.txt is a graph representation with the format: source_id, source_node_type edge_type sink_id, sink_node_type. For example, the edge:

22,3 1 21,4

means that the node with id 22 connects to the node with id 21 via the edge with id 1. Also, the node with id 22 has the type of 3, the node with id 21 has the type of 4.

For the list of node types, see this. For the list of edge types, see this.