/ligra

Clone of Julian Shun's Ligra Graph processing library.

Primary LanguageC

This code is part of the project "Ligra: A Lightweight Graph Processing
Framework for Shared Memory", presented at Principles and Practice of 
Parallel Programming, 2013 by Julian Shun and Guy Blelloch.

The paper can be found at : http://www.cs.cmu.edu/~jshun/ligra.pdf

This code currently compiles with g++ version 4.8.0 or higher with
support for Cilk+ and also with the Intel icpc compiler. To compile
with g++ using Cilk, define the environment variable CILK. To compile
with icpc, define the environment variable MKLROOT and make sure CILK
is not defined. To compile with g++ with no parallel support, make
sure both CILK and MKLROOT are not defined.

Implementations---Currently, Ligra comes with 7 implementation files:
BFS.C, BC.C (betweenness centrality), Radii.C, Components.C,
BellmanFord.C, PageRank.C and PageRankDelta.C. To compile all of them,
simply run "make" with the appropriate environment variables set as
described above. Currently the results of the computation are not
used, but the code can be easily modified to output the results to a
file.

To develop a new implementation, simply include "ligra.h" in the
implementation files. When finished, one may add it to the ALL
variable in Makefile.

The input format of an unweighted graphs should be in one of two formats.

1) The adjacency graph format from the Problem Based Benchmark Suite
 (http://www.cs.cmu.edu/~pbbs/benchmarks/graphIO.html). The adjacency
 graph format starts with a sequence of offsets one for each vertex,
 followed by a sequence of directed edges ordered by their source
 vertex. The offset for a vertex i refers to the location of the start
 of a contiguous block of out edges for vertex i in the sequence of
 edges. The block continues until the offset of the next vertex, or
 the end if i is the last vertex. All vertices and offsets are 0 based
 and represented in decimal. The specific format is as follows:

AdjacencyGraph
<n>
<m>
<o0>
<o1>
...
<o(n-1)>
<e0>
<e1>
...
<e(m-1)>

This file is represented as plain text.

2) In binary format. This requires three files NAME.config, NAME.adj,
and NAME.idx, where NAME is chosen by the user. The .config file
stores the number of vertices in the graph in text format. The .idx
file stores in binary the offsets for the vertices in the CSR format
(the <o>'s above). The .adj file stores in binary the edge targets in
the CSR format (the <e>'s above).

Weighted graphs: For format (1), the weights are listed at the end of
the file (after <e(m-1)>). Currently for format (2), the weights are
all set to 1.

By default, format (1) is used. To run an input with format (2), pass
the "-b" flag as a command line argument.

By default the offsets are stored as 32-bit integers, and to represent
them as 64-bit integers, compile with the variable LONG defined. By
default the vertex IDs (edge values) are stored as 32-bit integers,
and to represent them as 64-bit integers, compile with the variable
EDGELONG defined.

The readGraph function in IO.h takes three arguments: iFile,
symmetric, and binary. iFile is the name of the input file. symmetric
is a boolean variable which should be true if and only if the input
graph is symmetric (used for optimization purposes so we don't have to
transpose the graph) and binary is a boolean value which should be
true if and only if the input graph is represented as binary (format
(2) above). For an example of how to use the readGraph function,
please take a look at the examples in the parallel_main functions of
BFS.C (for unweighted graphs) and BellmanFord.C (for weighted graphs).

To write a parallel for loop in your code, simply use the parallel_for
construct in place of "for".

Data Structure: 

vertices (i.e. the vertex subset). Various constructors are given in
ligra.h

Functions:

edgeMap: takes as input 6 arguments: a graph G, vertices data
structure V, struct F, threshold argument (optional, default threshold
is |E|/20), an option in {DENSE, DENSE_PARALLEL, DENSE_FORWARD}
(optional, default value is DENSE), and a boolean indiciating whether
to remove duplicates (optional, default does not remove
duplicates). It returns as output a vertices data structure Out (see
section 4 of paper for how Out is computed).

The F struct must contain three boolean functions: update,
updateAtomic and cond.  update and updateAtomic should take two
integer arguments (corresponding to source and destination vertex). In
addition, updateAtomic should be atomic with respect to the
destination vertex. cond takes one argument corresponding to a vertex.

DENSE_PARALLEL corresponds to looping over each vertex's edges in
parallel in edgeMapDense, and DENSE_FORWARD corresponds to the
write-based version of edgeMapDense. These optimizations are described
in Section 4 of the paper.

vertexMap: takes as input 2 arguments: a vertices data structure V and
a function F which is applied to all vertices in V. It does not have a
return value.

vertexFilter: takes as input a vertices data structure V and a boolean
function F which is applied to all vertices in V. It returns a
vertices data structure containing all vertices v in V such that F(v)
returned true.

Example: An example unweighted graph rMatGraph_J_5_100 and weighted
graph rMatGraph_WJ_5_100 are provided. They are symmetric graphs, so
should be called with the "-s" flag. For example "./BFS
rMatGraph_J_5_100 -s" and "./BellmanFord rMatGraph_WJ_5_100 -s". The
rMat graphs along with other graphs can be generated with the graph
generators at www.cs.cmu.edu/~pbbs.