We noticed some slowness on travis, so figured I'd take a look.
Goal here was to calculate the likelihood a build will timeout given the current elapsed duration of the build.
The data was split into two sets:
- 'control' which is all of our travis builds between 2016/03 - 2016/11/18
- 'upgradedVM' which is once travis upgraded our machine .... 11/18-12/20
To run:
bundle install
make local
You'll need rvm / ruby / gnuplot.
When we went from the 'open source' VM pool to the 'premium' VM, we did see improvement:
A few things to note:
- the curve went 'down' - suggesting that total frequency of timeouts went down (the area under the curve is lower)
- the curve went 'right'
- the old 90% likelihood a build would fail happened at minute 34
- now it happens at minute 48
- suggesting that the VM still has resources and is doing work, and can complete before the timeout
- the curve got less 'steep' - suggesting that the VM makes real progress for all of the time the build is going
- can we measure the frequency of restarted builds?
- is there more resource contention on friday (anecdotally we believe this is true)?
Didn't quite finish setting it up as a pachyderm pipeline. With the services feature about to land, it would be cool to have the final 'pipeline' be a job hosting the png's generated by gnuplot