hoaproject/Test

PHPBench integration

dantleech opened this issue · 24 comments

I would like to propose introducing PHPBench benchmarks into Hoa.

PHPBench is a benchmarking framework (micro and macro), it is structurally similar to PHPUnit, and can be used in any situation where you have written a microtime script to test something. It is somewhat similar to Java's JMH and was inspired by Athletic. It is still under development.

Some of its advantages:

  • Benchmarks are located in the source repository (à la unit test files).
  • It can generate themeable reports (to the console, in markdown, or in HTML)
  • It can store results and allow you to compare different runs (eventually allowing you to store in a GIT branch).
  • Iterations are executed in isolated processes.

I propose it here because I have seen similar micro-benchmarks showcased by @Hywan on IRC and, honestly, I want to know if PHPBench is useful and fit-for-purpose.

Benchmarks could be located in the following path:

LibraryName/Test/Benchmark/SomeBench.php

And PHPBench itself could either be installed as a require-dev dependency, or globally on the developers machine (there will be a PHAR at some point). The benchmarks themselves have no runtime dependency on PHPBench.

@Hywan expressed a wish for there to be assertions for memory and time with the implication that it could be used in the CI process. This is currently not a feature of PHPBench due to time being affected by the platform the benchmarks are running on, but CI assertions (in whatever form) are clearly something to work towards.

So, there it is. Just an idea, if you are interested maybe @Hywan could show me the code he used to generate his benchmarks and I could make a PR..

I can easily imagine this as an generic atoum extension. See here for examples.

This would let us define new assertions working with PHPBench.

That would be interesting, but it would be quite a big depature from the current way PHPBench is used, e.g:

Would imagine something like this:

public function testThis()
{
    $this->phpbench
        ->assert('main.mean < 2s')
        ->benchmark(function () {
            md5('foobar')
         })->iterations(4)
            ->retryThreshold(2)
            ->revolutions(1000)
            ->etc();
}

But that means solving some problems, f.e. PHPBench launches the benchmarks in separate processes.

I was thinking more immediately about adding assertions to the annotations in the PHPBench cases:

/**
 * @Assert('mean < 2s and memory <= 200b')
 */
public function benchSomething()
{
    md5('hello world');
}

But it certainly would be a shame to not use atoum.

Hywan commented

@dantleech Is is possible to split PHPBench into the library and the CLI? This way, we will reduce the number of dependencies.

Hywan commented

(will be more “embeddable”)

I like what @dantleech proposes: we could easily wire annotations to atoum assertions directly in PHPbench. This would be really easy to do!

@dantleech do you want me to do a POC ?

Hywan commented

I just would like to throw an idea. memory <= 200b. It can be true or false according to the PHP VM we are using. Actually, this is very hard to predict the verdict of this assertion. Same with mean < 2s. How to deal with this?

@Hywan this is exactly why I suggested, on IRC, an assertion like the one we do on float: isNearlyEqualTo.

We could implement assertion allowing to check value with a delta: memory <= 200b would become memory <= 220b (if we apply a 10% delta)

@Hywan good point, perhaps by defining an assertion per env:

Assert('php_version = 5.4', 'mem < 200b')
Assert('php_version = 7', 'mem < 100b')

So you would target assertions at specific php versions / os / cpu / whatever. Lots of maintenance and of course you are assuming that these tests will only be run in specific environments (imagine every developer having their own assertions for they're personal laptops!).

Hywan commented

@dantleech You have to think in terms of VM and then VM's version, not only “PHP version”.

I would like more inputs from @hoaproject/hoackers community please :-)!

Hywan commented

So, what to do with this? PHPBench has a measure tool is awesome, it's not the question here. However, having this as a testing tool is apparently difficult because it hardly depends on the “execution engine”, i.e. the VM.

My suggestions:

  • Close this issue and drop the idea of using PHPBench as a testing tool,
  • Beeing ready to use PHPBench for some benchmarks, e.g. in Hoa\Compiler or Hoa\Ruler or Hoa\Router or… almost all libraries could have ones, but for what goals? ……

The main goal of such a tool for Hoa is to check if modifications in the code will not introduce any regressions in terms of performances too. So you make one or many reference runs before starting your patches, and then compare them with new patches applied. This is “local testing” (the term is not scientific at all).

New suggestions:

  1. Maybe we should write “generic/prefilled benchmarks” in a library (let's say Hoa\Compiler),
  2. Based on these generic benchmarks, for the first reference run, we generate tests,
  3. Then we can modify our code and re-run the first generated tests to ensure no regression.

In pseudo-code, it would look like:

$ hoa test:generate-benchmarks
… benchmarks are running several times
… we compute all the results
… we round them up (+5% for instance)
… we generate performance **reference** tests
$ vi foo.php
… do your stuff
$ hoa test:run
… check the performance of the new code against the **reference** tests

What do you think?

Not sure I understand the workflow above --

  • what is generated?
  • why add 5% ?

I think the value is currently in the local testing - my workflow (when using PHPbench) is similar to:

$ git checkout master
$ phpbench run benchmarks --store --iterations=30 --revs=10000 # normally this is in config
$ git checkout working_branch
$ phpbench run benchmarks --store --iterations=30 --revs=10000
$ phpbench report --uuid=latest --uid=latest-1 --report=aggregate # compare the difference
Hywan commented

Yes, this is “local testing”. You are testing whether your new patches are not introducing a performance regression. So you need a “reference version”. This is what we generate. We run several times PHPBench to extract numbers, and then, we generate tests saying: “The expected numbers are the following…”. Then, when you are developping, you can check the performance by running the generated tests.

Do you see?

The generated tests will never be commited.

However, as I can see, you have a storage and you can already compare results of several runs. This is interesting. So, we could drop the test generation I guess, no? Do you see any added gain?

I think there could be a "workflow" gain - it sounds as if you are sugesting that the "master" branch is automatically checked out (either in the CWD or elsewhere) and the benchmarking suite is executed and used then as a reference - that could be a nice extension for PHPBench.

But I don't think the workflow is too bad now, you generate the reference after checking out the master branch and receieve a UUID:

phpbench run --store benchmarks/Micro/Math/KdeBench.php --progress=dots
...
Run: 1339ffe9ce9066787b4fa8217f957ebbf8bb4656

and then you can reference that UUID in subsequent reports at any time and compare it with the "meta UUID" latest:

$ phpbench report --uuid=latest --uuid=1339ffe9ce9066787b4fa8217f957ebbf8bb4656 --report=blah
Hywan commented

OK. So what we can do is to “wrap” PHPBench into “short” commands, like we did with hoa test:run that is basically a wrapper around atoum (it pre-fill all the options, find the configuration files etc.) or with hoa devtools:cs that is a wrapper of the same nature around php-cs-fixer (find the configuration files, add our own CS rules etc.).

Maybe a:

  1. hoa test:performance --init that will do the first phpbench run --store …,
  2. hoa test:performance that will phpbench run and phpbench report against the first run,
  3. hoa test:performance --reset to go back the initialize state (--init implied --reset) but does nothing.

Bonus:

  • hoa test:performance --loop that waits the user to press Enter before running and comparing with the first run.

This workflow is good when you would like to compare often against an initial run, but it does not work great if you would like “incremental” comparisons/reports. Maybe something like:

  • hoa test:performance --delta implies a --init + run.

Finally, a hoa test:performance --clean is necessary to clean the storage.

What do you think?

Hywan commented

(Note: This proposal removes atoum for the party).

looks reasonable - the only negative is that by calling PHPBench by proxy you limit the power of it, e.g.

$ phpbench run benchmarks --iterations=50 --revs=1 --revs=10 --revs=100 --revs=1000 --store

In this example we can use the results to compare the memory usage for 1, 10, 100 and 1000 revolutions of the benchmark in a single process.

So I (personally) would tend to use PHPBench directly rather than through a wrapping script but I think perhaps what would be better to discuss would be the benchmarks themselves, the manner in which PHPbench is executed is secondary imo.

Can I propose that I rewrite the hoa (compiler?) benchmarks in PHPBench as a POC? I could also generate a Hoa themed HTML report. Then we would have something more concrete to discuss and would be able to see if it is worth the effort :)

Hywan commented

The hoa test:performance command will receive more options, obvisouly 😃.

And 👍 for your proposal. Go for it!

Hywan commented

ping?

oops, missed your response. will try and knock something together!

Hywan commented

ping :-)? I would be really enthousiast to integrate PHPBench!

I love the idea !

About reference, is not trustable for non regression performance we need run old code and new code in same time for have a fair comparison.

This mean reference is always master and for each PR we should compare PhpBench on master vs PHPBench on branch.

That would be ideal, but is tricky on Travis as it would mean doing two composer installs, in addition to the time penalty the system load might change - although PHPBench does at least provide baselines which can indicate deviations there.

As mentioned on IRC a good start might be with the Ruler library.

Hywan commented

@Pierozi @dantleech So far, I don't see this running on a CI server because this is hard to do a fair comparison. This could be interesting, but it's hard. First step is to get PHPBench in our Hoa\Test or Hoa\Devtools box. Then, we will see how we use it. I need PHPBench because I sometimes would like to compare my PR with themaster, but maybe we can have other usages.

Made a small PR on Ruler: hoaproject/Ruler#96