/parconnect

Computes graph connectivity for large graphs

Primary LanguageC++Apache License 2.0Apache-2.0

Parallel Implementation for Computing Graph Connectivity

Apache 2.0 License

This library implements a distributed connectivity algorithm for large graphs. It can compute the connected components in the undirected graphs or weakly connected components in the directed graphs. The algorithm is implemented in C++11 and MPI. The codebase supports the graph generation for building de Bruijn graphs from DNA sequence files, synthetic kronecker graphs and also a parallel edgelist file reader for any generic graph.

The connectivity algorithm implemented by this codebase is described in the following peer-reviewed publication. Please cite this paper, when using our code for academic purposes:

Patrick Flick, Chirag Jain, Tony Pan and Srinivas Aluru. "A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications." Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2015. dx.doi.org/10.1145/2807591.2807619

All the results in this publication were successfully reproduced at the cluster competition in SC16.

Dependencies

  • cmake version >= 2.6
  • g++ (version 4.9+)
  • an MPI implementation supporting MPI-2 or MPI-3.

Code organization

  • external (third-party) softwares are included in the ext/ directory of this project.
  • .cpp files are in the test/ directory of this project. src/ folder contains the implementation of the complete algorithm as header files.

Download and compile

The repository and external submodules can be downloaded using the recursive clone.

git clone --recursive <GITHUB_URL>

Compile using the cmake tool.

mkdir build_directory && cd build_directory
cmake ../parconnect
make -j4

Run

Inside the build directory,

mpirun -np <COUNT OF PROCESSES> ./bin/<EXECUTABLE> <ARGUMENTS>

Explanation of different use-cases and the test files are available for download here.