nschloe/perfplot

Progress bar shows no activity since recent version (v0.8.4)

Closed this issue · 16 comments

When I run perfplot with with this kernel https://www.kaggle.com/neomatrix369/many-things-performance-in-python/ (see bottom section starting Create another dataframe with a bit smaller number of data points, I get the below when the benchmarking is taking place:

Overall ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Kernels ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--

and then finishes I get the below:

Overall ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Kernels ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

Previously it would show the progress of the activities.

As usual, I need an MWE to reproduce.

As usual, I need an MWE to reproduce.

Code and data present here https://www.kaggle.com/neomatrix369/many-things-performance-in-python/ (see bottom section starting Create another dataframe with a bit smaller number of data points)

Is this fine or you like me to copy-paste the code into this issue? @nschloe

I don't have the time to guess which one you mean. Please just post it here.

Sorry about that, here's the MWE for your reference:

pip install swifter, pandas, numpy, joblib
import pandas as pd
import numpy as np
import swifter

def myfunc(x,y): return y*(x**2+1)

def setup_smaller_dataframe(n):
    num_of_datapoints = 15_000 # should be enough to show the performance numbers
    
    smaller_data = pd.DataFrame()
    smaller_data['col1'] = np.random.normal(size = num_of_datapoints)
    smaller_data['col2'] = np.random.normal(size = num_of_datapoints)
    return smaller_data


def using_pandas_apply(data):
    return data.apply(lambda row: myfunc(row.col1,row.col2), axis=1)

def using_swifter(data):
    return data.swifter.apply(lambda row: myfunc(row.col1,row.col2), axis=1)

def using_vectorization_method(data):
    return myfunc(data['col1'], data['col2'])

import perfplot  
import pandas as pd
import numpy as np

output = perfplot.bench(
    setup=setup_smaller_dataframe,
    kernels=[
        lambda data: using_pandas_apply(data),
        lambda data: using_swifter(data),
        lambda data: using_vectorization_method(data)
    ],
    labels=['using_pandas_apply', 
            'using_swifter',
            'using_vectorization_method'
    ],
    n_range=[k for k in range(0, 2)],
    xlabel='iterations',
    show_progress=True
)

plt.rcParams['figure.figsize'] = [20.0, 10.0]
output.show(logy="auto")

I hope this helps.

That's not minimal.

That's not minimal.

The issue is producible using this example, I don't follow what you mean by minimal, I could remove a few functions but not sure if that might help?

I trimmed off a number of lines now, although the problem I'm reporting is with the original version, I hope this helps convey the issue. Ideally best if the original code was used to reproduce this and the other issue #81

Do I really need to install swifter and pandas to reproduce the issue? Can you not produce something smaller?

If you want me to fix something, best make the reproducing example as easy and clear and short as possible. I don't think this is the case here.

The issue is occuring under these circumstances, it may not happen if I try to reduce the example into something else. Pandas or Swifter a pip command away.

Can you please suggest what else can I do to give you MWE, because the issue may not occur under that circumstance - I hope you do understand my case.

What's an example of MWE, can you please point me to something so I can try to replicate that?

I have just re-tested this with v0.8.2 and v0.8.3, the tdqm style progress bar does appear in those versions except for v0.8.4 when running in Kaggle notebooks.

it may

Because it may not occur, you won't invest even the slightest work in trying to minimize it. Instead, you expect me to do that. I find this quite rude.

it may

Because it may not occur, you won't invest even the slightest work in trying to minimize it. Instead, you expect me to do that. I find this quite rude.

What do you suggest I do, I'm sorry I don't intend to be rude at all - if that how it is coming across., I'm at a slight loss to understand what could be the underlying issue here and so you see i:m already doing the testing to help you. If you can give me suggestions I can certainly go do that and help you.

If you find an issue, you'll always have to put in some work in producing code to reproduce it. You did that. For anyone to take interest in that code, and out of politeness, you should always make sure the the code is as short as possible. This can be a lot of work, but it has to be done by someone. In many cases, the short code already shows quite clearly what the issue is, and makes it easier to get started fixing it. This is true for all software projects out there.

Your code is not small. It includes some big libraries. Your task: Produce an example that reproduces the error without importing the libraries and with as few lines as possible. If you cannot do that, than the issue is perhaps in pandas/sliver, and it's also worth reporting that.

I agree in principle about the idea behind MWE although with regards to pandas - there will be other users who will also face this and may then report it. I'm saying this as I have tested my own code with v0.8.2 and v0.8.3 of perfplot and the progress bar works fine there, and I will switch to it for now:

  0%|          | 0/5 [00:00<?, ?it/s]
  0%|          | 0/7 [00:00<?, ?it/s]
 14%|█▍        | 1/7 [00:01<00:11,  1.86s/it]
 29%|██▊       | 2/7 [00:07<00:14,  2.87s/it]
 43%|████▎     | 3/7 [00:08<00:10,  2.58s/it]
 57%|█████▋    | 4/7 [00:10<00:07,  2.37s/it]
 71%|███████▏  | 5/7 [00:15<00:05,  2.98s/it]
 86%|████████▌ | 6/7 [00:20<00:03,  3.76s/it]
100%|██████████| 7/7 [00:21<00:00,  2.92s/it]

I fixed some other things around the progress bars, so hopefully this is fixed now. If not, feel free to reopen with an MWE.