Hub Labeling is a data structure used to find a distance between two vertices in a graph. It provides excellent query time for real-world graphs such as road networks and social networks. Hub Labeling has two stages: preprocessing and querying. While the query algorithm is simple, the preprocessing is much more involved and there are several preprocessing algorithms in the literature.
There is an algorithm to find O(log n)-approximately optimal Hub Labels. However, it is slow. Practical labeling algorithms find a Hierarchical Hub Labeling. HHL algorithms order vertices and find hub labels which respect the order. In fact given an order one can build the minimum HHL with respect to this order.
There are several programs in the repository:
hhl
— find Hierarchical Hub Labeling (and a vertex order) using greedy algorithmakiba
— construct Hierarchical Hub Labeling from a vertex orderdegree
— order vertices by their degreelcheck
— check labelsghl
— find O(log n) approximate Hub Labeling
Just use make
to build the project:
$ make
g++ -fopenmp -O3 -I./lib -o akiba akiba.cpp
g++ -fopenmp -O3 -I./lib -o degree degree.cpp
g++ -fopenmp -O3 -I./lib -o hhl hhl.cpp
g++ -fopenmp -O3 -I./lib -o lcheck lcheck.cpp
g++ -fopenmp -O3 -I./lib -o ghl ghl.cpp
Once you've built the hhl
program you can compute HHL for a graph, say email.graph
:
$ ./hhl email.graph
Average label size 39.6134
Maximum label size 78
real 0m1.165s
The hhl
program contains implementations of two greedy HHL algorithms: path-greedy and label-greedy (as defined in [1]).
The default choice is path-greedy. To use label-greedy add -w
argument:
$ ./hhl -w email.graph
Average label size 36.797
Maximum label size 82
real 0m1.170s
If shortest paths are unique, greedy algorithms can be implemented more efficiently in terms of the running time.
The hhl
program has argument -u
which turns on the unique shortest paths mode.
You can use -u
with any graph, hhl
will just emulate the unique shortest path structure and run fast algorithm:
$ ./hhl -u email.graph
Average label size 50.6531
Maximum label size 112
real 0m0.336s
Options -o order_file
and -l label_file
can be specified to write the ordering and labels to files:
$ ./hhl -o email.order -l email.labels email.graph
Average label size 39.6134
Maximum label size 78
If your CPU has Hyper Threading enabled, use -t
argument to specify the number of threads to be equal to the real number of cores.
Performance may reduce dramatically if there are more threads than cores.
Keep in mind that hhl
requires O(n^2) memory and the time complexity is O(n^3) (O(nm log n) if -u
is set).
Akiba et al. found an efficient algorithm to construct minimal HHL from a vertex order.
You can use the akiba
program to construct the labels if you have an order file prepared (for example, by the hhl
program).
$ ./akiba -o email.order email.graph
Average label size 39.6134
Maximum label size 78
real 0m0.081s
As you can see, akiba
program reported exactly the same labeling size.
What's interesting is that the order doesn't have to be from a rigorous algorithm.
For example we can find the order in the unique shortest paths mode and use akiba
program to find the best labels corresponding to this order:
$ ./hhl -u -o email.order email.graph
Average label size 50.6531
Maximum label size 112
$ ./akiba -o email.order email.graph
Average label size 39.707
Maximum label size 82
Another simple way of ordering vertices is the degree order that works well for some social and computer networks: to order vertices by their degrees, breaking ties at random.
Use degree
program to find the degree order:
$ ./degree -o email.order email.graph
$ ./akiba -o email.order email.graph
Average label size 39.2462
Maximum label size 89
Akiba et al algorithm is much faster than greedy HHL algorithms. Furthermore, it requires a linear amount of memory, so degree
+akiba
can be used for large graphs:
$ ./degree -o coAuthorsCiteseer.order coAuthorsCiteseer.graph
$ ./akiba -o coAuthorsCiteseer.order coAuthorsCiteseer.graph
Average label size 405.171
Maximum label size 1170
This may take about 20 minutes. The coAuthorsCiteseer graph has about 200000 vertices which is too much for hhl
.
The akiba
program also has argument -l label_file
to write the labels.
Once you have the labels from a complex algorithm you may want to verify that this labels are correct.
You can use lcheck
with an option -c
for that.
Without this option lcheck
only reports the average and maximum label sizes.
$ ./lcheck -c -l email.labels email.graph
Labels OK
Average label size 39.6134
Maximum label size 78
real 0m0.233s
The ghl
program uses O(log n)-approximation algorithm to find optimal Hub Labels.
$ ./ghl email.graph
Average label size 30.3954
Maximum label size 61
real 1m1.278s
When we talk about the optimum we need to specify what we are talking about.
Labels can be viewed as vectors whith i'th coordinate equal to the size of label of vertex i.
Natural way to calculate a scalar measure from a vector is to get it's norm.
The 1-norm is a sum of vector's components, which is the total labeling size.
The 2-norm is a usual Euclidian norm.
The infinity norm is the maximum vector's component size, which is the maximum label size.
By default the ghl
program optimizes the total label size.
If you want to optimize another norm use -p norm
option where norm
is a float value.
For the infinity norm, type '-p max':
$ ./ghl -p max email.graph
Average label size 32.8438
Maximum label size 43
real 4m36.269s
Another tricky option of the ghl
program is -a alpha
. Alpha is a tradeoff parameter between the running time and the labeling quality.
By default alpha is 1.1.
Use alpha equal to 1.0 to obtain best possible labels:
$ ./ghl -a 1.0 email.graph
Average label size 30.0159
Maximum label size 59
real 2m18.289s
The ghl
program also has argument -l labeling_file
to write the labels.
You can use any graph in DIMACS shortest path or METIS format. We suggest visiting Dimacs 10 Challenge page for general graphs and Dimacs 9 Challenge for road networks.
Material covered in Sections 2, 3.1, 3.2, and A1 of [1] should be enough to understand akiba
and hhl
programs. Those who seek deeper knowledge of the Hub Labeling topic are encouraged to read the other papers.
- D. Delling, A.V. Goldberg, T. Pajor, and R. F. Werneck. Robust Exact Distance Queries on Massive Networks. Technical Report MSR-TR-2014-12, Microsoft Research, 2014. link
The algorithm used in the ghl
program is presented and studied in [5]. Theoretical background can be found in [3] and [4]. Simple version of the algorithm is presented in [4].
-
I. Abraham, D. Delling, A.V. Goldberg, and R.F. Werneck. Hierarchical Hub Labelings for Shortest Paths. In Proc. 20th European Symposium on Algorithms (ESA 2012), 2012. link
-
T. Akiba, Y. Iwata, and Y. Yoshida. Fast Exact Shortest-path Distance Queries on Large Networks by Pruned Landmark Labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD'13, pages 349--360. ACM, 2013. link
-
M. Babenko, A.V. Goldberg, A. Gupta, and V. Nagarajan. Algorithms for Hub Label Optimization. In Proc. 30th ICALP, pages 69--80. LNCS vol. 7965, Springer, 2013. link
-
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-hop Labels. SIAM Journal on Computing, 32, 2003. link
-
D. Delling, A.V. Goldberg, R. Savchenko, and R.F. Werneck. Hub Labels: Theory and Practice. In Proceedings of the 13th International Symposium on Experimental Algorithms (SEA'14), LNCS. Springer, 2014. link