nschloe/perfplot

[UX/UI] One of the lines in the plot is zero or very close to zero (incorrect positioning)

neomatrix369 opened this issue · 19 comments

In Reference kernel: https://www.kaggle.com/neomatrix369/many-things-performance-in-python/ (see bottom of the kernel), as can be seen from this plot that, the last two lines using_vectorization_method and using_vectorization_method (same) are on the 0-axis as per the plots:

image

When run separately their results are not far from using_swifter():

%timeit using_swifter(setup_smaller_dataframe(0))
8.85 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and

%timeit using_vectorization_method(setup_smaller_dataframe(0))
5.72 ms ± 87 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As per the above, the two should have been close to each other by a few ms - whatever that translate to in the world of plots.

Same goes with the other plot:

image

MWE please.

MWE please.

Code generate data and also to run the benchmarks are in this notebook: https://www.kaggle.com/neomatrix369/many-things-performance-in-python/ (towards bottom of the notebook: Create another dataframe with a bit smaller number of data points)

Same as in the other bug.

Same as in the other bug.

The code is the same: #80 (comment)

Look for the trailing plot lines close to zero in the plots generated.

As you can see from this image:

You will need the whole code snippet and not the trimmed version. I hope the trimmed version shows the same output.

I think the issue here may be that when there are one or more very small values that is needs plotting, among other large values - the larger values are overshadowing the smaller one(s). logy=True does not help in this case either.

This code snippet should give you the same issue as reported above:

import numpy as np
import perfplot  

def setup_smaller_dataframe(n):
    num_of_datapoints = 15_000 # should be enough to show the performance numbers
    return np.random.normal(size = num_of_datapoints)
    
def using_vectorization_method1(data):
    return 0

def using_vectorization_method2(data):
    return data * data

def using_vectorization_method3(data):
    for i in range(1_000_000):
        pass
    
    return data * data


output = perfplot.bench(
    setup=setup_smaller_dataframe,
    kernels=[
        lambda data: using_vectorization_method1(data),
        lambda data: using_vectorization_method2(data),
        lambda data: using_vectorization_method3(data),        
    ],
    labels=['using_vectorization_method1', 
            'using_vectorization_method2',
            'using_vectorization_method3'],
    n_range=[k for k in range(0, 2)],
    xlabel='iterations',
    show_progress=True,
    equality_check=None
)

# plt.rcParams['figure.figsize'] = [20.0, 10.0]
output.show(logy="auto")

as you can see from the plot:

image

Very good, now it's quite clear and easy to understand. The only thing I don't understand now: What's the problem exactly with the plot? You have two small and one larger timing. So?

Very good, now it's quite clear and easy to understand. The only thing I don't understand now: What's the problem exactly with the plot? You have two small and one larger timing. So?

The very small values appear very close to zero, I was of the understanding with logy="auto", or logy=True there would be some spacing between 0 and these values.

And to me it wasn't an issue till someone from the community was reading my kernel reports and asking me: are those values performance figures by using_vectorization_method2 and using_vectorization_method3 zero in value?

From a UX/UI point of view it helps seeing the differences a bit closer especially the fonts are smaller and also everything else looks minimised in size.

On another note, how can we increase the font of all those texts and numbers in the plot? Would changing matplotlib's config help or can you provide some flags with perfplot to help?

The very small values appear very close to zero

I guess that's because they are very close to zero. 😸 Using logy should make clear in what minuscule area the the values are.

On another note, how can we increase the font of all those texts and numbers in the plot? Would changing matplotlib's config help or can you provide some flags with perfplot to help?

Not sure now. I think you can plot instead of show and then edit the figure before plt.show().

The very small values appear very close to zero

I guess that's because they are very close to zero. 😸 Using logy should make clear in what minuscule area the the values are.

I have used logy="auto" and then logy=True also using the time_unit param, and neither help show the values like you say. And I have seen this work in the past.

On another note, how can we increase the font of all those texts and numbers in the plot? Would changing matplotlib's config help or can you provide some flags with perfplot to help?

Not sure now. I think you can plot instead of show and then edit the figure before plt.show().
Okay thanks I will play with it, and try to increase the font size

image

So the best it does it the above when I use the logy=True option

I see that the y-lines are few, but that's a matplotlib decision. If you want more, you can add more using mpl. Other than that, I don't see an issue.

I have worked around it, by adding a noop like lambda call, and different people may have different opinions about it and this seems to have helped the situation a bit temporarily:

faster_ones = perfplot.bench(
    setup=setup_smaller_dataframe,
    kernels=[
        lambda data: using_swifter(data),
        lambda data: using_vectorization_method(data),
        lambda data: data # as good as noop
    ],
    labels=[
        'using_swifter',
        'using_vectorization_method',
        'noop'
    ],
    n_range=[k for k in range(0, 10)],
    xlabel='iterations',
    equality_check=None
)
plt.rcParams['figure.figsize'] = [20.0, 15.0]
matplotlib.rc('font', **font)
slower_ones.show(logy=True, time_unit="ns")

As you can see the noop is the fastest and then comes the vectorized function:
image

Increasing the font size by doing this also helped with the readability:

import matplotlib
import matplotlib.plotly as plt

font = {'family' : 'normal',
        'weight' : 'normal',
        'size'   : 24}

matplotlib.rc('font', **font)
plt.rcParams['figure.figsize'] = [20.0, 15.0]

Ah, now I get it. You simply want the ylim to be smaller. You can also do that in mpl, no need for noop.

Ah, now I get it. You simply want the ylim to be smaller. You can also do that in mpl, no need for noop.

Oh how do we do that in mpl, not aware of that - is it a config? Let me know and I can add that and share.

Presumably you're doing

faster_ones = perfplot.bench(...)
faster_ones.plot(...)

In which case, you can then do

ax = plt.gca()
ax.set_ylim(bottom=0)

and the y axis will start at zero, with no need for your noop kernel?

@asongtoruin's suggestion is the correct fix.