Different results between different implementations
hoxbro opened this issue ยท 3 comments
I have been playing around with your implementation of lttb and trying to compare it to other implementations (numpy and numba) and have been impressed by the speed.
But I have noticed that the algorithm does not give the same results as your other implementation in plotly_resampler
. It seems to fail around 30,000 points. Do you know if this is expected?
Looking at the above log-log plot, some extra overhead is added, around 30,000. So I'm thinking it could be because of some race condition when parallelizing the algorithm.
import numpy as np
from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler
np.random.seed(1)
ns = [i for i in np.arange(1, 51) * 1000]
n_out = 1000
for n in ns:
x, y = np.arange(n), np.random.randn(n).cumsum()
py = LTTB_core_py.downsample(x, y, n_out)
ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out)
print(f"Matches with {n:>5}: {(py == ru).all()}")
Hi! I'm glad to hear that you like the speed of tsdownsample ๐ (I even think there might stil bel some more room for improving the runtime speed of tsdownsample)
Thank you for pointing out the discrepancy in the results between the LTTB implementation in tsdownsample and plotly_resampler. I apologize for the lack of documentation on this point.
To clarify, the MinMaxLTTB algorithm is a new heuristic for the LTTB algorithm, which first performs MinMax downsampling on the data before running the more expensive LTTB algorithm on the reduced data. The MinMaxLTTB downsampler takes an optional keyword argument called minmax_ratio
, which defines the ratio of the number of data points that the array will be downsampled to in the first step (i.e., n_out * minmax_ratio). The default value for minmax_ratio
is 30, which has been emperically proven to be a good default value (more on this in our upcoming paper).
Given that you have n_out
set to 1_000, you would expect to see some runtime overhead at 30_000 (since for shorter arrays, the MinMax downsampling is omitted).
As an illustration, here is some modified code where minmax_ratio
is set to 60:
import numpy as np
from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler
np.random.seed(1)
ns = [i for i in np.arange(1, 100, 2) * 1000]
n_out = 1000
for n in ns:
x, y = np.arange(n), np.random.randn(n).cumsum()
py = LTTB_core_py.downsample(x, y, n_out)
ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out, minmax_ratio=60)
print(f"Matches with {n:>5}: {(py == ru).sum() / 10}%")
The percentages above show the agreement of MinMaxLTTB with LTTB (which is also a local heuristic).
I hope this helps to clarify the behavior you observed. I'm happy to hear what you think of the MinMaxLTTBDownsampler ๐
P.S.: To use the parallel version of the tsdownsample algorithms (you should pass the parallel
=True to the downsample method) - once again my apologies for the lacking documentation.
@hoxbro Would it be possible to share your benchmark code?
@jvdd I have tested your implementation against a C based implementation as well as my personal numba based implementation of the native LTTB algorithm.
So far I have not noticed a general performance superiority (I am not using the parallel version). In fact your implementation only seems to be faster for a very large number of samples (>10^7) and a rather "small" number of buckets (<10000).
Oddly I have also observed sub-linear scaling of your implementation in case of >10^6 data points: a 10x increase in the number of data points lead to only a 3x increase in time. Whereas my implementation scales linearly.
Hi @muendlein,
I really appreciate other people taking the time to benchmark the code! ๐
I have created an issue concerning the benchmarking: #21
When reporting benchmark results that compare with other implementations, it is extremely important to make sure that we are comparing apples to apples;
- what downsample algorithm are you benchmarking (I'll try asap to improve documentation, #8)
- does you benchmark uses the
x
?- if so, does your implementation perform equidistant binning (i.e., searchsorted to find the start & end idxs of the equidistant bins)? -> this is necessary because we do not want to assume fixed sampling rate + no gaps in
x
(otherwise there is no point in providing thex
to M4 or MinMax downsampler)
- if so, does your implementation perform equidistant binning (i.e., searchsorted to find the start & end idxs of the equidistant bins)? -> this is necessary because we do not want to assume fixed sampling rate + no gaps in
- benchmark configuration?
- what datatypes are you using for
y
(andx
) - what arraylength are you benchmarking?
- what is your
n_out
?
- what datatypes are you using for
Perhaps you (@muendlein) could also share your benchmarks (with your C & numba implementation) as well?
Regarding your observations:
I believe this library shines for large arrays (millions to billions of datapoints) with an n_out
that is not too large. (Imo having a large n_out
does not make a lot of sense as you are downsampling for visualizing the data - which is always limited by the pixelwidth of your screen).
My takeaways from your remarks
- (possibly) we could improve the speed for smalller arrays
- using large
n_out
results in more overhead => (already observed this when usingx
& improved this for the parallel implementation, #15)
P.S. you are always welcome to contribute further to this library, pinpointing where the code is slower than other implementations is already a significant contribution ๐
(P.P.S. I observed the sublinear scaling as well, but need more time confirm this with proper benchmarks)