Rewrite for parallel implementation

Question

Rewrite for parallel implementation

Opened this issue 6 years ago · 6 comments

Write or find function for partitioning the Graph
Rewrite the algorithm to partition the Graphs and then runs ReduceToKTruss in parallel on each partition

Answer 1 · 2019-02-18T13:42:11.000Z

This seems to implement, what we are looking for:
https://networkit.github.io/dev-docs/cppdoc.html#

Answer 2 · 2019-02-18T13:43:49.000Z

The paper can be found here: http://langevin.univ-tln.fr/cours/PAA/extra/Tarjan-1972.pdf
If I don't misunderstand, the complexity should be linear in O(E + N)

Answer 3 · 2019-02-18T13:48:06.000Z

Since our algorithm is already working, I will start a new branch for the parallel implementation

Answer 4 · 2019-02-18T13:59:26.000Z

@niklasdeckers I wonder, whether doing the computation in parallel will actually be faster.

Our plan was to call the function ReduceToKTruss(Graph g, count k) for each component. Since we computed the data structures from scratch on for every call, we only would have to split the graph into components.
Now my latest change, uses the same data structures across all calls to ReduceToKTruss. That saves a large amount of time, however if we then do the calls in parallel, we will also need to split the queue structure. I think this should be doable in O(|E|), which doesn't theoretically increase the complexity, but in practice we have O(|E| * 4) more operations, which generates a lot of overhead.
I think the parallelism might only speed things up on a machine with a lot of CPU cores and a Graph with many small partitions.

What is your opinion on this? We now have an algorithm that is working, tested and takes around half a second for 150 nodes and an edge probability of 0.4 (about 2000 Edges) on my laptop. Are we fine with this, or do we want to implement parallelism? If not, we can still outline our design of the parallel algorithm in the presentation.

Answer 5 · 2019-02-19T14:21:41.000Z

Will work this out now. It might be enough to make a parallelization approach that is only efficient on certain types of graphs (especially the typical ones).

Answer 6 · 2019-02-21T12:04:50.000Z

So it seems to me that extracting biconnected components is not very effective on most "natural" graphs as there might be large areas (that would be very interesting for parallelism) that are connected with very few (but more than one) edges. This means that something like http://weiemmazhang.me/sublinks/papers/4.2.2013SIGMOD.Efficiently%20computing%20k-edge%20connected%20components%20via%20graph%20decomposition.pdf would be the way to go. Nevertheless, this seems not to be implemented in Networkit.
I will try to point out the difference empirically using the existing graphs for tomorrow.

The data handling problem remains and we might only gain efficiency when the splitting effort is justified by the graph structure.
Will do a few calculations on existing natural graphs - probably we do not really have to implement parallelism.