DylanMuir/ParforProgMon

Step size is multiplicative

okomarov opened this issue · 7 comments

There is a problem with the step size:

N   = 13000;
ppm = ParforProgMon('some title   ', N, floor(N/100),500, 55);
parfor f = 1:N
    pause(0.01)
    ppm.increment();
end

image

and it keeps going until reaches 13000%.

Thanks for the bug report, @okomarov. Now fixed in the master branch.

It doesn't seems to be fixed. Moreover:

  • Either the Java or the Matlab implementation should take care of that, not the user in the parfor loop.
  • I don't get the step size < 0, when would you want to use that?
  • Also, I find that pctRunOnAll() works better than attaching file.

If interested, you can check my master repo for the fixes to the above.

Hi @okomarov,

  • It now works for me; the progress window disappears after 100%. What issue do you still have?
  • The client should take care of the step size. The reason why, is that every call to ppm.increment() makes a network request. Your example makes 13000 network requests in very rapid succession. That may or may not be what a user wants, if the workers are not local.
  • The description of the stepSize parameter wasn't great in the documentation. I've made it clearer now, that it's the progress added to the bar on each call to ppm.increment().
  • I don't understand your comment about step size < 0. Where is it suggested to use a step size < 0?
  • pctRunOnAll() doesn't work at all. What if you don't have login access to the worker, so you can't install parforProgMon? What if it's on the machine, but in a different directory than on the server machine?

To followup:

  • It closes the window but the pool is still working. Try to put a pause(0.1) to see it more clearly. In principle, this newVal is still multiplied by the step size, which does not make sense.
  • The pool does not process indexes in order, so you can get processing at 300 and immediately afterwasrd 400. It is plain wrong. You can see that by putting a disp(f) inside the parfor
  • pctRunOnAll() works very well on my local pool and it the adopted solution by version 1 and 2 of the same project. I do not have the means to test a pool on a network, although I see no reason why communication between workers should be barred...how would the parfor run in the first place then?

The docs are still quite unclear.

I might see what you mean now, but honestly, you could explain that better. Still, since there is no guarantee of order, you can have a handshake on the final value, close the bar, while the pool is still working.

An example:

4 workers for 10,000 iterations. Usually Matlab splits the indices in 4 sequential groups, i.e. worker one will have indices 1:2,500, group 2 indices 2,501:5000 and so forth. Moreover, usually, they start working in reverse!

A disp(f) inside the parfor:

parfor f = 1:10000
    pause(0.1)
    disp(f)
end

should show something like

2500
7500
2499
...

Now, if your bar is terminated when value == bar.maximum(), and it's counting in reverse, you can just get a random close when in fact there are still thousands of iterations to go. This happens in my case.

There is no alternative to avoid network handshakes for a correct count.