Parallize aegir scripts
victorb opened this issue · 5 comments
I made a quick test to see if it's viable to parallize the different aegir scripts (branch here: https://github.com/ipfs/jenkins-libs/blob/parallize/vars/javascript.groovy). Reason why is because ipfs/js-ipfs jobs takes ~20 minutes from start to finish. The tests in ipfs/js-ipfs first runs nodejs tests, then browser and after that webworkers. Experiment was to run build once for each platform + nodejs version, then use that build for use for each one of the tests + os + nodejs version.
Conclusion: doesn't actually speed up builds significantly
- because we need one job per os + per nodejs version + aegir command, we end with 12 jobs for running tests
- this seems to slow down the jenkins pipeline because
stash/unstash
(which we use to pass around files) is not working well with large directories, making each job spending ~2 minutes just transfering files first - since tests are not isolated, we can only run one job per worker, making this parallization saturating the queue. We have 5 workers of each OS, which a parallel build of aegir scripts would require at least 6 of each OS (if we run one job). The queue gets full from just one test run
- when running 12 jobs at the same time in one stage, the reporting back to the master node is delayed, leading to jobs finishing in 2 minutes, actually not finishing until 5 minutes after finishing
- specific to ipfs/js-ipfs,
npm run test:node
is still the slowest one and slows down the complete reporting after pipeline finished.
I'm going to walk through some details of how tests are executed in AEgir currently just to make sure no details are missed. A lot of this is probably common knowledge, sorry if this riddle with details.
When it comes to parallelization, AEgir does not have a concept of test suites. The only concept that it has parallelization around are targets
, but currently this parallelization is turned off due to a hard coded concurrent execution limit set to 1
. Increasing this value though doesn't do anything different than this groovy script. It simply allows multiple targets to run concurrently.
In this js-ipfs project there are three named test targets, test:node
, test:browser
, test:webworker
.
When running test:node
, each of the separate suites of tests defined in the package.json
(test:node:core
, test:node:http
, test:node:gateway
, test:node:cli
), are ran serially. AEgir does not make a distinction between these as it's not aware of them.
I ran each suite parallel to each other (using a simply shell script), each row is a single run.
Test Run | core | http | gateway | cli | Total |
---|---|---|---|---|---|
1 | 24.40s | 113.33s | 3.88s | 491.06s | 632.67s |
2 | 23.52s | 114.36s | 5.13s | 501.95s | 644.96s |
3 | 24.39s | 114.07s | 6.98s | 491.91s | 637.35s |
4 | 24.98s | 112.91s | 5.07s | 491.04s | 634.00s |
5 | 23.86s | 113.06s | 5.59s | 490.48s | 632.99s |
6 | 23.86s | 114.57s | 5.29s | 492.69s | 636.41s |
7 | 22.40s | 113.37s | 8.79s | 493.52s | 638.08s |
8 | 23.83s | 113.34s | 9.97s | 498.46s | 645.60s |
9 | 21.07s | 114.43s | 5.66s | 482.90s | 624.06s |
10 | 22.11s | 114.49s | 4.60s | 506.10s | 647.30s |
Avg | 23.44s | 113.80s | 6.10s | 494.01s | 637.34s |
The test:node:cli
suite takes the longest time. This is probably in part, as many tests are run in both online and offline modes. This means then on average the test suite runs ~ 318s in either mode.
So over all, there isn't a huge advantage to breaking these and running them concurrently on the same worker. The cli
tests dominate the time currently.
The rest of this posting goes into some depth as to why the cli
tests are so slow.
The longest tests of the cli, almost 30%, comes from the following three tests
- do not crash if Addresses.Swarm is empty (66827ms)
- should handle SIGINT gracefully (65188ms)
- should handle SIGTERM gracefully (63033ms)
If we remove these tests, the cli tests are then running around ~ 442s, or ~ 221s in a single mode on average.
A lot of the cli tests (even after the daemon is running) take on average it appears upwards of 800ms. This appears to mostly be due to the start up time of the cli.
I ran a quick test, and it will take ~ 850ms (matching the cli test speed) for a full run of a command. The the start of code execution to the process exit, averaged around ~ 250ms, which means that around ~ 600ms is just parsing and loading modules.
I was able to measure this simply wrapping the main require
statements of cli.js
.
diff --git a/src/cli/bin.js b/src/cli/bin.js
index 1d53444..72a6878 100755
--- a/src/cli/bin.js
+++ b/src/cli/bin.js
@@ -2,11 +2,13 @@
'use strict'
+const st = (new Date).getTime()
const yargs = require('yargs')
const updateNotifier = require('update-notifier')
const readPkgUp = require('read-pkg-up')
const utils = require('./utils')
const print = utils.print
+console.log(((new Date).getTime() - st) / 1000)
const pkg = readPkgUp.sync({cwd: __dirname}).pkg
updateNotifier({
The test:node:cli
tests spawn the cli 201 times. This results in an overhead of ~ 120s for the full test run.
Currently working on this, will add a new npm run test:ci
script that will run all tests in parallel.
Todo:
- Make it possible to run
test:browser
andtest:webworker
simultaniously, requires fix in aegir to have dynamic ports in Karma, current issue is port collision - Make junit test reports have a timestamp or something unique, so we can have many test reports for the same area of tests
- Add
test:ci
script to js-ipfs and make sure it's working properly and faster than current stuff
Make it possible to run test:browser and test:webworker simultaniously, requires fix in aegir to have dynamic ports in Karma, current issue is port collision
I don't believe this is an issue with Karma itself. I believe Karma can handle a port already in use. When I was looking into some of this parallel work I found that the ipfsd-ctl
server was the issue. Both the browser and webworker tests of aegir use the same hooks browser
which causes two ipfsd-ctl
servers to be started.
Aegir should possibly have two hooks, one for the browser and another for webworker. For js-ipfs itself we can either start two different ipfsd-ctl
servers, or share a single instance and keep a ref count.
Aegir should possibly have two hooks, one for the browser and another for webworker. For js-ipfs itself we can either start two different ipfsd-ctl servers, or share a single instance and keep a ref count.
Agree, we need to start two ipfsd-ctl servers if we want parallel browser and webworker runs.
This issue was moved to ipfs-inactive/dev-team-enablement#102