Different results between different implementations

Question

Different results between different implementations

hoxbro opened this issue 2 years ago · 3 comments

I have been playing around with your implementation of lttb and trying to compare it to other implementations (numpy and numba) and have been impressed by the speed.

But I have noticed that the algorithm does not give the same results as your other implementation in plotly_resampler. It seems to fail around 30,000 points. Do you know if this is expected?

Looking at the above log-log plot, some extra overhead is added, around 30,000. So I'm thinking it could be because of some race condition when parallelizing the algorithm.

import numpy as np

from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler

np.random.seed(1)

ns = [i for i in np.arange(1, 51) * 1000]
n_out = 1000
for n in ns:
    x, y = np.arange(n), np.random.randn(n).cumsum()
    py = LTTB_core_py.downsample(x, y, n_out)
    ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out)

    print(f"Matches with {n:>5}: {(py == ru).all()}")

Answer 1 · 2023-01-02T19:07:15.000Z

Hi! I'm glad to hear that you like the speed of tsdownsample 🚀 (I even think there might stil bel some more room for improving the runtime speed of tsdownsample)

Thank you for pointing out the discrepancy in the results between the LTTB implementation in tsdownsample and plotly_resampler. I apologize for the lack of documentation on this point.

To clarify, the MinMaxLTTB algorithm is a new heuristic for the LTTB algorithm, which first performs MinMax downsampling on the data before running the more expensive LTTB algorithm on the reduced data. The MinMaxLTTB downsampler takes an optional keyword argument called minmax_ratio, which defines the ratio of the number of data points that the array will be downsampled to in the first step (i.e., n_out * minmax_ratio). The default value for minmax_ratio is 30, which has been emperically proven to be a good default value (more on this in our upcoming paper).

Given that you have n_out set to 1_000, you would expect to see some runtime overhead at 30_000 (since for shorter arrays, the MinMax downsampling is omitted).

As an illustration, here is some modified code where minmax_ratio is set to 60:

import numpy as np

from plotly_resampler.aggregation.algorithms.lttb_py import LTTB_core_py
from tsdownsample import MinMaxLTTBDownsampler

np.random.seed(1)

ns = [i for i in np.arange(1, 100, 2) * 1000]
n_out = 1000
for n in ns:
    x, y = np.arange(n), np.random.randn(n).cumsum()
    py = LTTB_core_py.downsample(x, y, n_out)
    ru = MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out, minmax_ratio=60)
    print(f"Matches with {n:>5}: {(py == ru).sum() / 10}%")

The percentages above show the agreement of MinMaxLTTB with LTTB (which is also a local heuristic).

I hope this helps to clarify the behavior you observed. I'm happy to hear what you think of the MinMaxLTTBDownsampler 👀

P.S.: To use the parallel version of the tsdownsample algorithms (you should pass the parallel=True to the downsample method) - once again my apologies for the lacking documentation.

Answer 2 · 2023-01-20T17:22:25.000Z

@hoxbro Would it be possible to share your benchmark code?
@jvdd I have tested your implementation against a C based implementation as well as my personal numba based implementation of the native LTTB algorithm.
So far I have not noticed a general performance superiority (I am not using the parallel version). In fact your implementation only seems to be faster for a very large number of samples (>10^7) and a rather "small" number of buckets (<10000).

Oddly I have also observed sub-linear scaling of your implementation in case of >10^6 data points: a 10x increase in the number of data points lead to only a 3x increase in time. Whereas my implementation scales linearly.

Answer 3 · 2023-01-21T09:08:16.000Z

Hi @muendlein,

I really appreciate other people taking the time to benchmark the code! 😊
I have created an issue concerning the benchmarking: #21

When reporting benchmark results that compare with other implementations, it is extremely important to make sure that we are comparing apples to apples;

what downsample algorithm are you benchmarking (I'll try asap to improve documentation, #8)
does you benchmark uses the x?
- if so, does your implementation perform equidistant binning (i.e., searchsorted to find the start & end idxs of the equidistant bins)? -> this is necessary because we do not want to assume fixed sampling rate + no gaps in x (otherwise there is no point in providing the x to M4 or MinMax downsampler)
benchmark configuration?
- what datatypes are you using for y (and x)
- what arraylength are you benchmarking?
- what is your n_out?

Perhaps you (@muendlein) could also share your benchmarks (with your C & numba implementation) as well?

Regarding your observations:
I believe this library shines for large arrays (millions to billions of datapoints) with an n_out that is not too large. (Imo having a large n_out does not make a lot of sense as you are downsampling for visualizing the data - which is always limited by the pixelwidth of your screen).

My takeaways from your remarks

(possibly) we could improve the speed for smalller arrays
using large n_out results in more overhead => (already observed this when using x & improved this for the parallel implementation, #15)

P.S. you are always welcome to contribute further to this library, pinpointing where the code is slower than other implementations is already a significant contribution 🚀

(P.P.S. I observed the sublinear scaling as well, but need more time confirm this with proper benchmarks)