Leaks networks on every run
Opened this issue · 10 comments
Minimal .gitlab-ci.yml illustrating the issue
---
docker_build:
stage: package
image: docker:latest
services:
- docker:dind
script:
- echo "blablabla"
Expected behavior
After running, clear the network that was needed for the service and the job container to talk together.
Host information
MacOS
gitlab-ci-local 4.55.0
Containerd binary
docker
Additional context
https://github.com/firecow/gitlab-ci-local/blob/master/src/job.ts#L543 < not tracked for cleanup
Hi @firecow ,
This is the output of my "docker network ls" currently (minus some redacted stuff for my company);
➜ docker network ls
NETWORK ID NAME DRIVER SCOPE
51fbb22039db bridge bridge local
9fe84a65a347 docker_gwbridge bridge local
6df0aef1c8a5 gitlab-ci-local-130397 bridge local
6e78377fe61e gitlab-ci-local-200711 bridge local
29b6022248e3 gitlab-ci-local-201744 bridge local
176f9fc46cc9 gitlab-ci-local-235698 bridge local
dd8619c29826 gitlab-ci-local-284263 bridge local
d9cc612fdb5a gitlab-ci-local-351190 bridge local
884c9c02eee9 gitlab-ci-local-371592 bridge local
d147c413d3f5 gitlab-ci-local-375682 bridge local
1f7e90481cfc gitlab-ci-local-501394 bridge local
cdbf32f7f9e6 gitlab-ci-local-535650 bridge local
1b4057b7b5f9 gitlab-ci-local-558862 bridge local
b6b57e9795c8 gitlab-ci-local-574073 bridge local
7e34c53c5bff gitlab-ci-local-579972 bridge local
ccd262ce6df9 gitlab-ci-local-654062 bridge local
02cb192c820a gitlab-ci-local-668695 bridge local
e866a4a3540a gitlab-ci-local-714030 bridge local
23964309e2f7 gitlab-ci-local-738116 bridge local
54d988391d24 gitlab-ci-local-768931 bridge local
f8da1545297a host host local
y18vrgxhbg68 ingress overlay swarm
I do see the network clean up after in theory, but in practice, something is probably going wrong. My guess is that I should be seeing some message somewhere based on https://github.com/firecow/gitlab-ci-local/blob/master/src/job.ts#L593 or maybe the assert
of the containers is what caused the network cleanup to be skipped? I believe the latter may be the case considering those assert
are within the catch
. I'm not familiar with Javascript "assert" and the best practices, however, I fail to understand how it wouldn't be an "Error" instance...
likely these are leaked from the test suites, you can replicate it by npm run test
and after few seconds
I never ran the test suite of gitlab-ci-local. I run gitlab-ci-local from the brew installation.
These are probably leaked from my job that run a docker in docker and failed to complete. The failure probably triggered some other failures (containers not being removed and skipping the rest of the cleanup ?)
hmm, ic, not sure then.. i dont really run docker-in-docker pipeline
hopefully, it's something that is replicable
I just had the problem with a job that just runs out of a container... no issue that I know of...
For those having the same issue, here is what I ran ;
for network in $(docker network ls); do if [[ "$network" == *"gitlab"* ]]; then echo "$network"; docker network rm $network; fi ; done
gitlab-ci-local-9409
gitlab-ci-local-9409
gitlab-ci-local-95666
gitlab-ci-local-95666
gitlab-ci-local-130397
gitlab-ci-local-130397
gitlab-ci-local-200711
gitlab-ci-local-200711
gitlab-ci-local-201744
gitlab-ci-local-201744
gitlab-ci-local-235698
gitlab-ci-local-235698
gitlab-ci-local-284263
gitlab-ci-local-284263
gitlab-ci-local-351190
gitlab-ci-local-351190
gitlab-ci-local-371592
gitlab-ci-local-371592
gitlab-ci-local-375682
gitlab-ci-local-375682
gitlab-ci-local-451685
gitlab-ci-local-451685
gitlab-ci-local-501394
gitlab-ci-local-501394
gitlab-ci-local-509319
gitlab-ci-local-509319
gitlab-ci-local-535650
gitlab-ci-local-535650
gitlab-ci-local-536928
gitlab-ci-local-536928
gitlab-ci-local-558862
gitlab-ci-local-558862
gitlab-ci-local-562280
gitlab-ci-local-562280
gitlab-ci-local-574073
gitlab-ci-local-574073
gitlab-ci-local-579972
gitlab-ci-local-579972
gitlab-ci-local-654062
gitlab-ci-local-654062
gitlab-ci-local-668695
gitlab-ci-local-668695
gitlab-ci-local-700167
gitlab-ci-local-700167
gitlab-ci-local-714030
gitlab-ci-local-714030
gitlab-ci-local-738116
gitlab-ci-local-738116
gitlab-ci-local-768931
gitlab-ci-local-768931
gitlab-ci-local-788165
gitlab-ci-local-788165
gitlab-ci-local-859507
gitlab-ci-local-859507
I've got it again this morning :) I'm pretty sure it accumulate on a job failure...
I'm currently trying to debug a job we have defined that starts a service container of mockserver, start our webcomponent and start testing against the webcomponent. I got a failure in my tests, which fails the job, but nothing special about it...
I've ran that job over 50 times at least yesterday...
Got it again yesterday, and I had it today as well. I will try to notice which "job" leave some network behind... the problem I have is that my workflow is dependant on gitlab-ci-local to run anything 😅 (we're bought on the concept of everything needs to be runnable in the CI and locally).
However, the jobs I've been running were just starting a service for mockserver and the other was a shell job (no relation to docker)...
At this point, I'm pretty sure it's when the job fails and has a service that the network isn't cleaned... I don't know how I could dig out more information for this ticket. If you have an idea, please let me know.
Ok, I can now say for sure that the leak is happening on successful run as well...
The job that leaks is using a service with an alias... I have yet to be able to determine what causes the leak... is it the service container not closing fast enough ?
I get similar errors after an unspecific number of runs.
Error: Command failed with exit code 1: docker network create gitlab-ci-local-261361
Error response from daemon: all predefined address pools have been fully subnetted
at makeError (/opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/execa/lib/error.js:60:11)
at handlePromise (/opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/execa/index.js:118:26)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Job.createDockerNetwork (file:///opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/src/job.js:1053:39)
at async Job.start (file:///opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/src/job.js:442:13)
at async /opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/p-map/index.js:57:22
I attempted to increase the number in the pool, but that's only a band-aid. The real fix would be killing the networks when docker kills the containers.
I run docker network prune
to remove all networks and just start over.