wtsi-hgi/hgi-systems

GitLab runners constantly try to remove non-existent machines

Closed this issue · 3 comments

GitLab runners constantly try to remove non-existent machines, filling up syslog and kern.log in the process.

From syslog:

May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="About to remove runner-555937ac-gitlab-runner-1493127151-d115a2a6" name=runner-555937ac-gitlab-runner-1493127151-d115a2a6 operation=remove #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="About to remove runner-555937ac-gitlab-runner-1493127151-d115a2a6" name=runner-555937ac-gitlab-runner-1493127151-d115a2a6 operation=remove
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="WARNING: This action will delete both local reference and remote instance." name=runner-555937ac-gitlab-runner-1493127151-d115a2a6 operation=remove #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="WARNING: This action will delete both local reference and remote instance." name=runner-555937ac-gitlab-runner-1493127151-d115a2a6 operation=remove
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="About to remove runner-555937ac-gitlab-runner-1493179948-e68275d8" name=runner-555937ac-gitlab-runner-1493179948-e68275d8 operation=remove #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="About to remove runner-555937ac-gitlab-runner-1493179948-e68275d8" name=runner-555937ac-gitlab-runner-1493179948-e68275d8 operation=remove
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="WARNING: This action will delete both local reference and remote instance." name=runner-555937ac-gitlab-runner-1493179948-e68275d8 operation=remove #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="WARNING: This action will delete both local reference and remote instance." name=runner-555937ac-gitlab-runner-1493179948-e68275d8 operation=remove
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="(runner-555937ac-gitlab-runner-1493237772-01589fd4) Deleting OpenStack instance..." name=runner-555937ac-gitlab-runner-1493237772-01589fd4 operation=remove #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=info msg="(runner-555937ac-gitlab-runner-1493237772-01589fd4) Deleting OpenStack instance..." name=runner-555937ac-gitlab-runner-1493237772-01589fd4 operation=remove
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=warning msg="Retrying removal" created=2h8m25.255066128s name=runner-555937ac-gitlab-runner-1493826041-ae269904 reason="machine is unavailable" used=2h7m54.619259714s #012<nil>
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-ci-multi-runner[971]: time="2017-05-08T13:09:32Z" level=warning msg="Retrying removal" created=2h8m25.255066128s name=runner-555937ac-gitlab-runner-1493826041-ae269904 reason="machine is unavailable" used=2h7m54.619259714s
May  8 13:09:32 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:09:32Z" level=warning msg="Retrying removal" created=55m22.763263317s name=runner-555937ac-gitlab-runner-1493204445-31c315b5 reason="Too many idle machines" used=40m20.18756703s #012<nil>

From kern.log:

May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=info msg="About to remove runner-555937ac-gitlab-runner-1493175152-d3d5b661" name=runner-555937ac-gitlab-runner-1493175152-d3d5b661 operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=info msg="WARNING: This action will delete both local reference and remote instance." name=runner-555937ac-gitlab-runner-1493175152-d3d5b661 operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=warning msg="Retrying removal" created=35m23.151422396s name=runner-555937ac-gitlab-runner-1493196413-f4fc1dbc reason="machine is unavailable" used=23m9.290612238s #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=info msg="(runner-555937ac-gitlab-runner-1493808714-563ae51b) Deleting OpenStack instance..." name=runner-555937ac-gitlab-runner-1493808714-563ae51b operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=error msg="Error removing host \"runner-555937ac-gitlab-runner-1493127151-112b2ffd\": Expected HTTP response code [202 204] when accessing [DELETE http://172.27.66.32:8774/v2.1/e95a9c47113a4a1499e6c51c04d4d15e/servers/], but got 404 instead" name=runner-555937ac-gitlab-runner-1493127151-112b2ffd operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=error msg="404 Not Found" name=runner-555937ac-gitlab-runner-1493127151-112b2ffd operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=error msg="The resource could not be found." name=runner-555937ac-gitlab-runner-1493127151-112b2ffd operation=remove #012<nil>
May  8 13:13:17 gitlab-runner-delta-hgi-ci-01 gitlab-runner[971]: time="2017-05-08T13:13:17Z" level=error msg="   " name=runner-555937ac-gitlab-runner-1493127151-112b2ffd operation=remove #012<nil>

Only the symptom of the problem was addressed in #18.

I've put in a request to get our GitLab upgraded (#577413). We can then upgrade the GitLab runners and see if this issue has been resolved there.

Hopefully the more frequent and aggressive logrotate will suffice for now.

GitLab + GitLab multi-runner has been upgraded, which (cross fingers!) should fix this issue.