wesm/vbench

Reworking vbench with lessons learned

Opened this issue · 10 comments

@TomAugspurger is (allegedly :) ) putting the vbench moves on statsmodels
pandas-dev/pandas#4777 (comment)

and @yarikoptic has already done so for numpy
http://www.onerussian.com/tmp/numpy-vbench/

now with 3 major projects taking advantage of vbench, it's clear
that moving functionality from pandas' test_perf.py into vbench upstream
would be a a win for everyone concerned. @wesm has already suggested this,
and I completely agree.

The reason I originally opted for a standalone script (with too much monkeypatching) was that
vbench seemed (to me) in need of a largish rewrite and I didn't want to get sidetracked.

as far as a roadmap goes, there's yaroslav's #33 with patches, there's the "todo" I put together a while back in #27. There's the existing test_perf which can be gutted and merged here, hopefully much cleaned up after vbench sees some internal work.

the pypy speed project, https://github.com/tobami/codespeed/ would be nice to leverage or
to implement similar functionality as well. There are are similar projects out there (in mozilla
land for example).

Hi @y-p et al.

FWIW:

vbench seemed (to me) in need of a largish rewrite and I didn't want to get sidetracked

same here although so far I have ended up getting sidetracked indeed from original idea of just minimal set of patches to get me going (in #33), e.g. I switched from a straight loop of incrementally deducing the number to a simple estimation, so not much of CPU is wasted there and there is no 0.1-1sec spread among possible target benchmark run times.

ATM I am experimenting with numpy and combination of what should be enhanced number / repeats combination to get estimates more "robust" through time -- I think that reducing required number of runs/target test run time but increasing repeats (among which min is taken) should generally be benefitial.

I am re-running numpy's benchmarks with these settings to decide either it is worth to push such a change to my elderly PR

test_perf is indeed the first one which should get absorbed into vbench, with or withough big refractoring/rewrite of that one -- I just had no chance also to look into that and if someone would do that -- it would be great indeed.

BTW -- anyone interested to give at least a lightening talk on vbench at upcoming pycon 2014? I am considering to go since it is close by... may be worth giving at least a lightening talk on numpy-vbench...
I thought deadline was 9th (today) but now I see September 15th, 2013: Talk and tutorial proposals due, so there is time to act

test_perf has a whole bunch of cmdline switches to set runs, repeats, repeates per run, burn-in,
random seed, cpu affinity and even generate summary stats on a repeated test. I very desperately
tried anything I could think of to track down the reason for variability in vbench results.
I ultimately failed (at least success was limited).
You might do worse then playing with it as a cheap way to experiment.

That happened over a few threads, mostly with @stephenwlin, you can find them
quite easily on the pandas issue tracker.

Wes did a lightning talk on vbench way back I think. it's on youtube somewhere.

I don't know the context of this post (just getting cc'ed by e-mail since I
was mentioned), but now that I'm older and wiser I realize a big source of
variability is processor dynamic frequency scaling (i.e. TurboBoost or
similar). At Apple we have some OS X specific ways of turning it off, but I
don't know if there's a more general solution that you can use on other
platforms or not.
-Stephen

On Mon, Sep 9, 2013 at 7:48 AM, y-p notifications@github.com wrote:

test_perf has a whole bunch of cmdline switches to set runs, repeats,
repeates per run, burn-in,
random seed, cpu affinity and even generate summary stats on a repeated
test. I very desperately
tried anything I could think of to track dows the reason for variability
in vbench results.
I ultimately failed (at least success was limited).
You might do worse then playing with it as a cheap way to experiment.

That happened over a few threads, mostly with @stephenwlinhttps://github.com/stephenwlin,
you can find them
quite easily on the pandas issue tracker.

Wes did a lightning talk on vbench way back I think. it's on youtube
somewhere.


Reply to this email directly or view it on GitHubhttps://github.com//issues/34#issuecomment-24082135
.

Hi stephen,

(That mention was totally intentional... :) )

Good point. But In my case I'm on a core2 which predates turboboost.
The mobo has various overclocking gizmos and so exposes a lot of those kinds of knobs.
I'm fairly certain I've disabled all dynamic changes boosts etc and I also use a linux cpu
governor that does not do frequency switching... I still saw all those phenomena.

Other possible causes are SMIs or cache pressure from other processes on the system,
I've seen this at work with linux perf counters when benchmarking c code.

My money is still on cpython implementation details though.

Yoval

Ok cool just thought I would mention it in case.

Sent from my iPhone

On Sep 9, 2013, at 10:07 AM, y-p notifications@github.com wrote:

Hi stephen,

(That mention was totally intentional... :) )

Good point. But In my case I'm on a core2 which predates turboboost.
The mobo has various overclocking gizmos and so exposes a lot of those kinds of knobs.
I'm fairly certain I've disabled all dynamic changes boosts etc and I also use a linux cpu
governor that does not do frequency switching... I still saw all those phenomena.

Other possible causes are SMIs or cache pressure from other processes on the system,
I've seen this at work with linux perf counters when benchmarking c code.

My money is still on cpython implementation details though.

Yoval


Reply to this email directly or view it on GitHub.

BTW -- for numpy-vbench, as I have just described in #33, now I am rerunning all benchmarks for all revisions upon every vbench run, and then taking min between known from before performance estimate and the current one. It does take long time I know, but I hope that with some time it will result in cleaning up the plots quite nicely, and that box is pretty much dedicated to just running numpy-vbench 24x7 ;-)

@yarikoptic, I just changed to a new distro and realized in the proceedings I had ccache
misconfigured on the old setup. If you're not using ccache or not sure it's working properly
(does ccache -s show hits/misses?), you might want to check it out, the compilation time
diff for me was 8x.

Thanks for the warning. I will check hits but when I compared timing it indeed was about 8 times

y-p notifications@github.com wrote:

@yarikoptic, I just changed to a new distro and realized in the
proceedings I had ccache
misconfigured on the old setup. If you're not using ccache or not sure
it's working properly (does ccache -s
show hits/missed), you might want to check it out, the compilation time
diff for me was 8x.


Reply to this email directly or view it on GitHub:
#34 (comment)

Sent from a phone which beats iPhone.

On Sat, 07 Dec 2013, Yaroslav Halchenko wrote:

Thanks for the warning. I will check hits but when I compared timing it
indeed was about 8 times

a data point... will compare tomorrow to the next one ;)

cache directory /home/neurodebian/.ccache
cache hit (direct) 3076538
cache hit (preprocessed) 184869
cache miss 156096
called for link 812432
compile failed 896566
ccache internal error 2
preprocessor error 25
cache file missing 2
no input file 11653
files in cache 4184
cache size 912.4 Mbytes

max cache size 1.0 Gbytes

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

Glad to hear it. it's quite an inconvenience when developing, but when running vbench across large number of
commits it makes a huge difference.