/travis_build_times

Take a look at the build success /timeout rate to figure out how we should adjust our timeouts

Primary LanguageRuby

Travis Build Times

We noticed some slowness on travis, so figured I'd take a look.

Goal here was to calculate the likelihood a build will timeout given the current elapsed duration of the build.

The data was split into two sets:

  • 'control' which is all of our travis builds between 2016/03 - 2016/11/18
  • 'upgradedVM' which is once travis upgraded our machine .... 11/18-12/20

To run:

bundle install
make local

You'll need rvm / ruby / gnuplot.

Result

When we went from the 'open source' VM pool to the 'premium' VM, we did see improvement:

refactorVSUpgrade.png

A few things to note:

  • the curve went 'down' - suggesting that total frequency of timeouts went down (the area under the curve is lower)
  • the curve went 'right'
    • the old 90% likelihood a build would fail happened at minute 34
    • now it happens at minute 48
    • suggesting that the VM still has resources and is doing work, and can complete before the timeout
  • the curve got less 'steep' - suggesting that the VM makes real progress for all of the time the build is going

Questions

  • can we measure the frequency of restarted builds?
  • is there more resource contention on friday (anecdotally we believe this is true)?

Todo

Didn't quite finish setting it up as a pachyderm pipeline. With the services feature about to land, it would be cool to have the final 'pipeline' be a job hosting the png's generated by gnuplot