/T-FSM

[SIGMOD'23] A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph

Primary LanguageC++

T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph

Serial Verision

Under the root directory of the project, execute the following commands to compile.

cd T-FSM/serial
make

Run serial FSM using the following command. All frequent patterns are saved in grami_results.txt.

./run -file [GRAPHPATH] -freq [F] -maxNodes [MAXNODE]

GRAPHPATH: required, the input graph path

F: required, the user-given support threshold

MAXNODE: optional, parameter to constrain the maximum subgraph size (number of vertices)

Example:

Use the following command:

./run -file ../data/mico.lg -freq 9480

Parallel Version

Under the root directory of the project, execute the following commands to complie. If the mined patterns are large (>=6 vertices), we highly suggest users enable #OPTIMIZED_MATCH option, which consumes slightly more memory. All frequent patterns are printed with #VERBOSE enabled.

cd T-FSM/parallel
make

Run parallel FSM using the following command.

./run -file [GRAPHPATH] -freq [F] -thread [T] -maxNodes [MAXNODE]

GRAPHPATH: required, the input graph path

F: required, the user-given support threshold

T: required, the number of threads, 32 by default

MAXNODE: optional, parameter to constrain the maximum subgraph size (number of vertices)

Example:

Use the following command:

./run -file ../data/mico.lg -freq 9480 -thread 4

Fraction-Score Version

Under the root directory of the project, execute the following commands to complie.

cd T-FSM/fraction-score
make

Run Fraction-Score FSM using the following command.

./run -file [GRAPHPATH] -freq [F] -thread [T] -maxNodes [MAXNODE]

GRAPHPATH: required, the input graph path

F: required, the user-given support threshold, in terms of Fraction-Score metric

T: required, the number of threads, 32 by default

MAXNODE: optional, parameter to constrain the maximum subgraph size (number of vertices)

Input

Input graph starts with 't N M' where N is the number of vertices and M is the number of edges. A vertex and an edge are formatted as 'v VertexID VertexLabel Degree' and 'e VertexID VertexID EdgeLabel' respectively. Note that we require that the vertexID starts from 0 and the range is [0,|V| - 1] where V is the vertex set. If the input graph is not edge-labelled, mark all edge labels as 1. The following is an input sample

t 5 6
v 0 0 2
v 1 1 3
v 2 2 3
v 3 1 2
v 4 2 2
e 0 1 1
e 0 2 1
e 1 2 1
e 1 3 1
e 2 4 1
e 3 4 1