/hl

Hub Labeling Algorithms Implementation

Primary LanguageC++MIT LicenseMIT

hl - Hub Labeling Algorithms

Hub Labeling is a data structure used to find a distance between two vertices in a graph. It provides excellent query time for real-world graphs such as road networks and social networks. Hub Labeling has two stages: preprocessing and querying. While the query algorithm is simple, the preprocessing is much more involved and there are several preprocessing algorithms in the literature.

There is an algorithm to find O(log n)-approximately optimal Hub Labels. However, it is slow. Practical labeling algorithms find a Hierarchical Hub Labeling. HHL algorithms order vertices and find hub labels which respect the order. In fact given an order one can build the minimum HHL with respect to this order.

Usage

There are several programs in the repository:

  • hhl — find Hierarchical Hub Labeling (and a vertex order) using greedy algorithm
  • akiba — construct Hierarchical Hub Labeling from a vertex order
  • degree — order vertices by their degree
  • lcheck — check labels
  • ghl — find O(log n) approximate Hub Labeling

Build

Just use make to build the project:

$ make
g++ -fopenmp -O3 -I./lib -o akiba akiba.cpp
g++ -fopenmp -O3 -I./lib -o degree degree.cpp
g++ -fopenmp -O3 -I./lib -o hhl hhl.cpp
g++ -fopenmp -O3 -I./lib -o lcheck lcheck.cpp
g++ -fopenmp -O3 -I./lib -o ghl ghl.cpp

HHL

Once you've built the hhl program you can compute HHL for a graph, say email.graph:

$ ./hhl email.graph
Average label size 39.6134
Maximum label size 78

real    0m1.165s

The hhl program contains implementations of two greedy HHL algorithms: path-greedy and label-greedy (as defined in [1]). The default choice is path-greedy. To use label-greedy add -w argument:

$ ./hhl -w email.graph
Average label size 36.797
Maximum label size 82

real    0m1.170s

If shortest paths are unique, greedy algorithms can be implemented more efficiently in terms of the running time. The hhl program has argument -u which turns on the unique shortest paths mode. You can use -u with any graph, hhl will just emulate the unique shortest path structure and run fast algorithm:

$ ./hhl -u email.graph
Average label size 50.6531
Maximum label size 112

real    0m0.336s

Options -o order_file and -l label_file can be specified to write the ordering and labels to files:

$ ./hhl -o email.order -l email.labels email.graph
Average label size 39.6134
Maximum label size 78

If your CPU has Hyper Threading enabled, use -t argument to specify the number of threads to be equal to the real number of cores. Performance may reduce dramatically if there are more threads than cores.

Keep in mind that hhl requires O(n^2) memory and the time complexity is O(n^3) (O(nm log n) if -u is set).

Akiba

Akiba et al. found an efficient algorithm to construct minimal HHL from a vertex order. You can use the akiba program to construct the labels if you have an order file prepared (for example, by the hhl program).

$ ./akiba -o email.order email.graph
Average label size 39.6134
Maximum label size 78

real    0m0.081s

As you can see, akiba program reported exactly the same labeling size. What's interesting is that the order doesn't have to be from a rigorous algorithm. For example we can find the order in the unique shortest paths mode and use akiba program to find the best labels corresponding to this order:

$ ./hhl -u -o email.order email.graph
Average label size 50.6531
Maximum label size 112
$ ./akiba -o email.order email.graph
Average label size 39.707
Maximum label size 82

Another simple way of ordering vertices is the degree order that works well for some social and computer networks: to order vertices by their degrees, breaking ties at random. Use degree program to find the degree order:

$ ./degree -o email.order email.graph
$ ./akiba -o email.order email.graph
Average label size 39.2462
Maximum label size 89

Akiba et al algorithm is much faster than greedy HHL algorithms. Furthermore, it requires a linear amount of memory, so degree+akiba can be used for large graphs:

$ ./degree -o coAuthorsCiteseer.order coAuthorsCiteseer.graph
$ ./akiba -o coAuthorsCiteseer.order coAuthorsCiteseer.graph
Average label size 405.171
Maximum label size 1170

This may take about 20 minutes. The coAuthorsCiteseer graph has about 200000 vertices which is too much for hhl.

The akiba program also has argument -l label_file to write the labels.

lcheck

Once you have the labels from a complex algorithm you may want to verify that this labels are correct. You can use lcheck with an option -c for that. Without this option lcheck only reports the average and maximum label sizes.

$ ./lcheck -c -l email.labels email.graph
Labels OK
Average label size 39.6134
Maximum label size 78

real    0m0.233s

GHL

The ghl program uses O(log n)-approximation algorithm to find optimal Hub Labels.

$ ./ghl email.graph
Average label size 30.3954
Maximum label size 61

real    1m1.278s

When we talk about the optimum we need to specify what we are talking about. Labels can be viewed as vectors whith i'th coordinate equal to the size of label of vertex i. Natural way to calculate a scalar measure from a vector is to get it's norm. The 1-norm is a sum of vector's components, which is the total labeling size. The 2-norm is a usual Euclidian norm. The infinity norm is the maximum vector's component size, which is the maximum label size. By default the ghl program optimizes the total label size. If you want to optimize another norm use -p norm option where norm is a float value. For the infinity norm, type '-p max':

$ ./ghl -p max email.graph
Average label size 32.8438
Maximum label size 43

real    4m36.269s

Another tricky option of the ghl program is -a alpha. Alpha is a tradeoff parameter between the running time and the labeling quality. By default alpha is 1.1. Use alpha equal to 1.0 to obtain best possible labels:

$ ./ghl -a 1.0 email.graph
Average label size 30.0159
Maximum label size 59

real    2m18.289s

The ghl program also has argument -l labeling_file to write the labels.

Graphs

You can use any graph in DIMACS shortest path or METIS format. We suggest visiting Dimacs 10 Challenge page for general graphs and Dimacs 9 Challenge for road networks.

Basic Reference

Material covered in Sections 2, 3.1, 3.2, and A1 of [1] should be enough to understand akiba and hhl programs. Those who seek deeper knowledge of the Hub Labeling topic are encouraged to read the other papers.

  1. D. Delling, A.V. Goldberg, T. Pajor, and R. F. Werneck. Robust Exact Distance Queries on Massive Networks. Technical Report MSR-TR-2014-12, Microsoft Research, 2014. link

Advanced Reference

The algorithm used in the ghl program is presented and studied in [5]. Theoretical background can be found in [3] and [4]. Simple version of the algorithm is presented in [4].

  1. I. Abraham, D. Delling, A.V. Goldberg, and R.F. Werneck. Hierarchical Hub Labelings for Shortest Paths. In Proc. 20th European Symposium on Algorithms (ESA 2012), 2012. link

  2. T. Akiba, Y. Iwata, and Y. Yoshida. Fast Exact Shortest-path Distance Queries on Large Networks by Pruned Landmark Labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD'13, pages 349--360. ACM, 2013. link

  3. M. Babenko, A.V. Goldberg, A. Gupta, and V. Nagarajan. Algorithms for Hub Label Optimization. In Proc. 30th ICALP, pages 69--80. LNCS vol. 7965, Springer, 2013. link

  4. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-hop Labels. SIAM Journal on Computing, 32, 2003. link

  5. D. Delling, A.V. Goldberg, R. Savchenko, and R.F. Werneck. Hub Labels: Theory and Practice. In Proceedings of the 13th International Symposium on Experimental Algorithms (SEA'14), LNCS. Springer, 2014. link