GRAPH MINING

The Graph Mining application uses Apache Spark or Apache Flink to find either the maximum Truss or all specific k-Trusses in a graph.

The input file is expected to consist of one line per edge in the graph. Each line must begin with an integer followed by a separator and another integer. Both these integers must equate to node IDs in the graph.

Building the Projects

Both Projects require Java 7 and Maven.

To build a project, execute "Maven Install" in the main project directory.

We recommend using IntelliJ when working on either project.

Program Modes

triangle:

Calculates all unique triangles in the graph.

truss:

Calculates all k-trusses in the graph, where k is defined by the user. A k-truss is a maximal subgraph in which every edge is part of at least k-2 triangles.

maxtruss:

Calculates all trusses with the maximum k value in the graph. The initial k value for the algorithm is defined by the user.

Apache Spark Verison

Launch Parameters:

  • 0: mode [string] -- The portion of the program to be exectued. Valid inputs: triangle, truss, maxtruss
  • 1: input path [string] -- path to the file containing the graph data, formatted as described above
  • 2: output path [string] -- path where the output should be written to
  • 3: separator [string] -- the separator used between an edge's nodes, as described above
  • 4: partitioning [int] -- number of partitions the data should be split into, should be a multiple of the number of workers available
  • 5: [optional depending on mode] (starting) k value [int] -- k value for the truss mode or initial k value for the maxtruss mode

Example Launch:

spark-submit --conf -Dspark.master=local spark.app.name="graph-mining" --class de.hpi.dbda.graph_mining_spark.GraphMiningSpark target/graph_mining_spark-0.0.1-SNAPSHOT.jar truss ../trussMini.txt ../output/ " " 10 4

Apache Flink Verison

Launch Parameters:

  • 0: mode [string] -- The portion of the program to be exectued. Valid inputs: triangle, truss, maxtruss
  • 1: input path [string] -- path to the file containing the graph data, formatted as described above
  • 2: output path [string] -- path where the output should be written to
  • 3: separator [string] -- the separator used between an edge's nodes, as described above
  • 4: [optional depending on mode] (starting) k value [int] -- k value for the truss mode or initial k value for the maxtruss mode

Example Launch:

$flink run --parallelism 10 --class de.hpi.dbda.graph_mining.GraphMiningFlink target/graph-mining-flink-1.0-SNAPSHOT.jar truss ../trussMini.txt ../output/ " " 4

Output

truss/maxtruss:

The output has one line for each edge that is contained in a truss and looks as follows:

1 2 3 4 5
  • 1: Truss ID
  • 2: ID of the first vertex
  • 3: Degree of the first vertex
  • 4: ID of the second vertex
  • 5: Degree of the second vertex

triangle:

The output has one line for each triangle and looks as follows:

1 2 3 4 5 6 7 8 9 10  11  12

Numbers 1-4 represent an edge, as do numbers 5-8, and numbers 9-12. Each of these sets represents and edge of the triangle.

  • 1/5/9: ID of the first vertex
  • 2/6/10: Degree of the first vertex
  • 3/7/11: ID of the second vertex
  • 4/8/12: Degree of the second vertex