:detective: parallel `MinMaxDownsampler` with x does not scale well to large `n_out`
jvdd opened this issue · 0 comments
We observe a significant (but constant) overhead when using MinMaxDownsampler
- with x, and
- with a large
n_out
, and parallel=True
I believe the overhead in the parallel implementation with x originates from the use of independent (thread-safe) binary search in the index generator, which is performed twice (per bin) to find the start and end indices of each bin, resulting in a complexity of 2*n_out*log(n)
.
In comparison, the sequential implementation with x only performa binary search once (per bin) by utilizing the moving left value as the start of the bin and thus limiting the search area for the right index, resulting in a complexity of n_out*log(n/2).
Benchmarking results show that this overhead can be significant when downsampling to a n_out of 60k points, which is a common scenario when using the parallel MinMaxDownsampler with x in the MinMaxLTTB downsampler. (see below ⬇️ )
More detail (parallel
vs sequential
) and (with x
vs without x
)
I belueve that we should manage the number of threads ourselves as a solution, by creating a threadpool with the same number of threads as the number of CPUs available (as is done now). This would allow us to perform a subslice version of the sequential repeated binary search on each thread, utilizing the moving left advantages safely on a per thread basis.