clearmatics/zeth

CI action fails occasionally

Closed this issue · 5 comments

@AntoineRondelet reported occational failure of macos debug build, due to timeout in mpc tests.

(Could be a timing issue related to waiting for the coordinator server to come up?)

Would be good to get a copy-paste from the action next time it fails to help track this down.

Here's the error taken from the CI on #245 before I restarted the jobs (all tests pass after restarting the jobs). Weirdly I don't see anything related to contributor timeout - which I saw this am.
Let's see if/when that reproduces...

2020-09-01T14:10:59.8729090Z TEST: waiting for server to start ...
2020-09-01T14:10:59.8753520Z curl: (7) Failed to connect to localhost port 8002: Connection refused
2020-09-01T14:10:59.8782050Z + echo 'TEST: waiting for server to start ...'
2020-09-01T14:10:59.8797120Z + sleep 1
2020-09-01T14:11:00.9890800Z + get_state_phase2
2020-09-01T14:11:00.9989420Z + curl -k --cacert /Users/runner/work/zeth/zeth/_test_server_phase2/cert.pem --fail https://localhost:8002/state
2020-09-01T14:11:01.0238790Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2020-09-01T14:11:01.0253150Z                                  Dload  Upload   Total   Spent    Left  Speed
2020-09-01T14:11:01.0254400Z
2020-09-01T14:11:01.1107100Z   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
2020-09-01T14:11:01.1111530Z 100    97  100    97    0     0   1114      0 --:--:-- --:--:-- --:--:--  1114
2020-09-01T14:11:01.1169380Z ++ cat server.pid
2020-09-01T14:11:01.1241230Z + echo 'TEST: server up (pid: 90074)'
2020-09-01T14:11:01.1242230Z {"next_contributor_index": 0, "num_contributors": 4, "next_contributor_deadline": "1598969464.0"}TEST: server up (pid: 90074)
2020-09-01T14:11:01.1243210Z + popd
2020-09-01T14:11:01.1243880Z ~/work/zeth/zeth
2020-09-01T14:11:01.1258880Z + contribute /Users/runner/work/zeth/zeth/testdata/mpc_key1.bin 1 /Users/runner/work/zeth/zeth/_test_server_phase2
2020-09-01T14:11:01.1262290Z + contributor_dir=_test_contrib_1
2020-09-01T14:11:01.1266300Z + rm -rf _test_contrib_1
2020-09-01T14:11:01.1271010Z + contribute /Users/runner/work/zeth/zeth/testdata/mpc_key3.bin 3 /Users/runner/work/zeth/zeth/_test_server_phase2 '--wait-interval 1'
2020-09-01T14:11:01.1275820Z + contributor_dir=_test_contrib_3
2020-09-01T14:11:01.1278920Z + rm -rf _test_contrib_3
2020-09-01T14:11:01.1341520Z + mkdir -p _test_contrib_1
2020-09-01T14:11:01.1353950Z + mkdir -p _test_contrib_3
2020-09-01T14:11:01.1432210Z + pushd _test_contrib_3
2020-09-01T14:11:01.1438050Z ~/work/zeth/zeth/_test_contrib_3 ~/work/zeth/zeth
2020-09-01T14:11:01.1449490Z + phase2_contribute --skip-user-input --mpc-tool /Users/runner/work/zeth/zeth/build/mpc_tools/mpc_phase2/mpc-test-phase2 --server-certificate /Users/runner/work/zeth/zeth/_test_server_phase2/cert.pem --insecure --wait-interval 1 https://localhost:8002 /Users/runner/work/zeth/zeth/testdata/mpc_key3.bin
2020-09-01T14:11:01.1451080Z ~/work/zeth/zeth/_test_contrib_1 ~/work/zeth/zeth
2020-09-01T14:11:01.1456380Z + pushd _test_contrib_1
2020-09-01T14:11:01.1461910Z + phase2_contribute --skip-user-input --mpc-tool /Users/runner/work/zeth/zeth/build/mpc_tools/mpc_phase2/mpc-test-phase2 --server-certificate /Users/runner/work/zeth/zeth/_test_server_phase2/cert.pem --insecure https://localhost:8002 /Users/runner/work/zeth/zeth/testdata/mpc_key1.bin
2020-09-01T14:11:04.7361500Z
2020-09-01T14:11:04.7362000Z response
2020-09-01T14:11:04.7365750Z Digest of the contribution was:
2020-09-01T14:11:04.7365950Z 175b2d57 6b677dd2 2b1e057f 2950560a
2020-09-01T14:11:04.7366400Z 408e48b0 15f9922f a433a9a9 6fc51fd5
2020-09-01T14:11:04.7367110Z 04ebc9a5 1fa67b8f 744a44bd 332f7e02
2020-09-01T14:11:04.7367470Z eea41e75 34602959 3805f75a 42c9c6a7
2020-09-01T14:11:04.7367630Z Digest written to: response.bin.digest
2020-09-01T14:11:04.9246710Z /Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
2020-09-01T14:11:04.9247020Z Got key
2020-09-01T14:11:04.9247220Z   warnings.warn(
2020-09-01T14:11:04.9248030Z /Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
2020-09-01T14:11:04.9248550Z   warnings.warn(
2020-09-01T14:11:04.9249330Z /Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
2020-09-01T14:11:04.9249780Z   warnings.warn(
2020-09-01T14:11:04.9251050Z /Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
2020-09-01T14:11:04.9251400Z   warnings.warn(
2020-09-01T14:11:04.9251620Z Got challenge
2020-09-01T14:11:04.9252720Z CMD: /Users/runner/work/zeth/zeth/build/mpc_tools/mpc_phase2/mpc-test-phase2 phase2-contribute challenge response --digest response.bin.digest --skip-user-input
2020-09-01T14:11:04.9278040Z Traceback (most recent call last):
2020-09-01T14:11:04.9278750Z   File "/Users/runner/work/zeth/zeth/mpc/env/bin/phase2_contribute", line 7, in <module>
2020-09-01T14:11:04.9280030Z     exec(compile(f.read(), __file__, 'exec'))
2020-09-01T14:11:04.9280310Z   File "/Users/runner/work/zeth/zeth/mpc/commands/phase2_contribute", line 73, in <module>
2020-09-01T14:11:04.9280550Z     phase2_contribute()
2020-09-01T14:11:04.9281580Z   File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 764, in __call__
2020-09-01T14:11:04.9281880Z     return self.main(*args, **kwargs)
2020-09-01T14:11:04.9282980Z   File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 717, in main
2020-09-01T14:11:04.9283270Z     rv = self.invoke(ctx)
2020-09-01T14:11:04.9284340Z   File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 956, in invoke
2020-09-01T14:11:04.9284630Z     return ctx.invoke(self.callback, **ctx.params)
2020-09-01T14:11:04.9285580Z   File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 555, in invoke
2020-09-01T14:11:04.9285860Z     return callback(*args, **kwargs)
2020-09-01T14:11:04.9286100Z   File "/Users/runner/work/zeth/zeth/mpc/commands/phase2_contribute", line 62, in phase2_contribute
2020-09-01T14:11:04.9286310Z     contribute(
2020-09-01T14:11:04.9286540Z   File "/Users/runner/work/zeth/zeth/mpc/coordinator/contribute.py", line 95, in contribute
2020-09-01T14:11:04.9286780Z     _upload_response(client, response_file, sk)
2020-09-01T14:11:04.9287010Z   File "/Users/runner/work/zeth/zeth/mpc/coordinator/contribute.py", line 27, in _upload_response
2020-09-01T14:11:04.9287250Z     client.push_contribution(response_file, digest, vk, signature)
2020-09-01T14:11:04.9291010Z   File "/Users/runner/work/zeth/zeth/mpc/coordinator/client.py", line 110, in push_contribution
2020-09-01T14:11:04.9291270Z     resp.raise_for_status()
2020-09-01T14:11:04.9292160Z   File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status
2020-09-01T14:11:04.9293190Z     raise HTTPError(http_error_msg, response=self)
2020-09-01T14:11:04.9293490Z requests.exceptions.HTTPError: 403 Client Error: FORBIDDEN for url: https://localhost:8002/contribute
2020-09-01T14:11:10.0219610Z ##[error]Process completed with exit code 1.)))))

@dtebbs let me know if you want the full trace, I have all the logs from the CI for this run if needed.

Here we go, here's the error I usually had on my failing PRs, which is now showing up again on #266:

/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
Got key
  warnings.warn(
/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/urllib3/connectionpool.py:981: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  warnings.warn(
Traceback (most recent call last):
  File "/Users/runner/work/zeth/zeth/mpc/env/bin/phase2_contribute", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/Users/runner/work/zeth/zeth/mpc/commands/phase2_contribute", line 77, in <module>
    phase2_contribute()
  File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/runner/work/zeth/zeth/mpc/env/lib/python3.8/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/runner/work/zeth/zeth/mpc/commands/phase2_contribute", line 66, in phase2_contribute
    contribute(
  File "/Users/runner/work/zeth/zeth/mpc/coordinator/contribute.py", line 85, in contribute
    wait_for_turn(client, wait_interval, verification_key)
  File "/Users/runner/work/zeth/zeth/mpc/coordinator/contribute.py", line 56, in wait_for_turn
    raise Exception("contributor turn has passed")
Exception: contributor turn has passed
##[error]Process completed with exit code 1.

I suspect we need to widen the contribution window: https://github.com/clearmatics/zeth/blob/develop/testdata/mpc_phase1_server_config.json#L11 and https://github.com/clearmatics/zeth/blob/develop/testdata/mpc_phase2_server_config.json#L8 to something bigger than 10 secs to make sure the contributors have time to contribute during the CI build (especially when the CI server is very busy, which may be the cause of this error)

Exception: contributor turn has passed

Thanks for the log @AntoineRondelet . It appears to be the issue I mentioned of contributors failing to create and submit contributions within the test interval. I'll try increasing contribution_interval, which should solve this.

The corresponding PRs (i.e. #269 and #265 ) have now been merged.
Closing this issue as a consequence, but will re-open this ticket if the CI's job keeps failing