wesm/vbench

Ideas for improvement

Opened this issue · 0 comments

  • Benchmarks should register themselves with a name, so
    duplicate names can detected pandas-dev/pandas@977d581
  • [] The git log parsing is expensive, should take a commit range to speed things up
    (pandas' test_perf currently monkey-patches vbench to do this)
  • use sh to wrap the subprocess calls, as cleanup.
  • the repo parsing only includes commits on a given branch, should
    be less restrictive
  • using os.system and friends to invoke commands doesn't allow for
    redirecting output, so the runs are very noisy and callers can't do much about it.
    rework to use logging, or take in a stream as arg.
  • support arbitrary-length prefix style hash spec, just like git. (iow, resolve hashes via git)
  • expose the gc disable option as a documented interface, it's important.
  • use python-git or similar rather then system(), speed up repo parsing at the very least.
  • allow for custom build script (pandas has a build cahce system that can virtually eliminate build times)
  • vbench is very "hygenic" when recreating build environments, but the overhead per build (clone + build per commit) is excessive.
  • the rest of test_perf functionality (compare two commits, text reporting on the commandline, repeated measurements and summary stats, exporting results as dataframes, etc')
  • allow enforcement of an upper limit on vbench duration, to bound the suite runtime to something manageable.
  • pandas 5550, behavior change resulted in 600x change to perf. How to explicitly handle this sort of historical context?
  • often you want to identify changes to algo complexity rather then just overal abs difference (O(1) -> O(N)).
    If vbecnhes were parameterized by dataset size, that could be detected automatically. (For example pandas-dev/pandas#5660 (comment), is it O(1) construction overhead?)
  • use experience shows that occasionaly, vbenches need to be modified after the fact (adjusting dataset size
    is common). This is not a problem for the "perf diff" use case, but for historical tracking it invalidates the
    timeseries. need to control for that and warn the user (immutable vbenches or perhaps hash the vbench ast
    to be less rigid re program text).
  • use sqlalchemy commit/transaction control to speed up the inital db creation (it shouldn't be that slow).
  • canned export of results into easy format (json) for consumption by other tools (jenkins plug-in for example, similar to coverage.xml and xunit xml)