Parallelize permutation inversion in LLP?
progval opened this issue · 3 comments
@zacchiro points out that the LLP process spends some non-negligeable time single-threaded
A quick look at perf top
shows it is in this function:
webgraph-rs/src/algorithms/llp.rs
Line 194 in b4e4176
Would it make sense to allocate an array for another permutation, and invert the permutation not-in-place into that array so it can be parallelized?
Some numbers:
2023-09-20T08:19:28+02:00 - INFO - Completed.
2023-09-20T08:19:28+02:00 - INFO - Elapsed: 3m 38s [27,397,574,122 nodes, 125233018.78 nodes/s, 7.99 ns/node]
2023-09-20T08:19:28+02:00 - INFO - 7 updates, 4h 32m 10s, 1.54 updates/h, 38.88 m/update
2023-09-20T08:19:28+02:00 - INFO - Gain: 0.0009318041063451512
2023-09-20T08:19:28+02:00 - INFO - Modified: 124751634
2023-09-20T08:19:28+02:00 - INFO - Completed.
2023-09-20T08:19:28+02:00 - INFO - Elapsed: 4h 32m 10s [7 updates, 1.54 updates/h, 38.88 m/update]
2023-09-20T11:29:34+02:00 - INFO - Log-gap cost: 1638690340843
2023-09-20T11:32:19+02:00 - INFO - 2 gammas, 21h 32m 1s, 2.23 gammas/d, 10.77 h/gamma; 16.67% done, 2d 23h 46m 43s to end; used/avail/free/total mem 1.27TB/870.12GB/190.86GB/4.33TB
the 2.2h jump in timestamps is the permutation inversion
WTF seriously? I would have never thought that it would be so slow. Yes, if you have the space we can do an out-of-place inversion. I don't think there's any easy way to do parallel in-place inversion because you cannot predict how cycles will develop. It might be possible that since most of the cost is moving, if you look at the cycle and mark it as "to move", you might have some threads marking and some threads permuting. But probably since the cost in accessing the memory, not reading or writing, that wouldn't be advantageous.