Infrequent Dojo crash when using docker-compose driver
tomzo opened this issue · 3 comments
Dojo process crashes from time to time (around 1 in 100) when ran with docker-compose. This causes the some of the containers created by docker-compose to stay running on the CI agent. (Because after the crash, there is nothing to clean them up).
It seems that there are 2 things wrong here:
docker-compose ps
is called before before_app_1
was already created. I suppose theps
is part of the background monitoring process to check if containers are running. But perhaps it's kicking-in too early...Exit status: 1
from docker-compose is causing the Dojo process to crash. That should just never happen.
Logs from CI:
2020/12/01 15:15:53 [ 1] INFO: (main.main) Dojo version 0.7.0
2020/12/01 15:15:53 [ 4] INFO: (main.DockerComposeDriver.HandleRun) docker-compose run command will be:
docker-compose -f docker-compose-dtest.yml -f docker-compose-dtest.yml.dojo -p dojo-******** run --rm -T default "./tasks _test_docker"
Creating network "dojo-***********_default" with the default driver
Pulling app (*************.amazonaws.com/**********)...
2bedf8f: Pulling from *****/app
Creating ***********_db_1 ...
Creating ***********_db_1 ... done
Creating ****************************_app_1 ...
panic: Unexpected exit status:
Command: docker-compose -f docker-compose-dtest.yml -f docker-compose-dtest.yml.dojo -p dojo-********* ps
Exit status: 1
StdOut: <empty string>
StdErr: No such container: ded74698ff6c7539c16d506ac0d05a8ccc1884e8da4a3030f50c8b68d2de63a2
goroutine 20 [running]:
main.DockerComposeDriver.getDCContainersNames(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
/dojo/work/src/dojo/docker_compose_driver.go:601 +0x7f3
main.DockerComposeDriver.waitForContainersToBeRunning(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
/dojo/work/src/dojo/docker_compose_driver.go:237 +0x170
main.DockerComposeDriver.watchContainers(0x535e80, 0xc00006e0c0, 0x5367e0, 0xc000062300, 0xc000062300, 0xc0000601e0, 0x510825, 0x3, 0x7ffd4fef2a4a, 0xe, ...)
/dojo/work/src/dojo/docker_compose_driver.go:270 +0x1d6
created by main.DockerComposeDriver.HandleRun
/dojo/work/src/dojo/docker_compose_driver.go:390 +0x5ec
Creating ****************************_app_1 ... done
Workaround released in Dojo 0.10.3, however I couldn't reproduce this error.
Reproduced on CircleCI here: https://app.circleci.com/pipelines/github/kudulab/dojo/51/workflows/43161f70-6d0f-40c0-9a63-59e17e21b965/jobs/175 using commit 5a344fe
Log messages:
DEBUG: (main.DockerComposeDriver.HandleRun) Exit status from run command: 0\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.HandleRun) Collecting information from non default containers\n
2024/02/04 07:20:12 [ 8] ERROR: (main.DockerComposeDriver.getDCContainersNames) \x1b[31mUnexpected exit status:\n
Command: docker-compose -f ./test/test-files/itest-dc.yaml -f ./test/test-files/itest-dc.yaml.dojo -p testdojorunid ps --format json --all\n
Exit status: 1\n
StdOut: <empty string>\n
StdErr: Error response from daemon: No such container: 731492b22407b5d22db460ac5daee3a2e46e24286dfd4f6916b09457018eb66b\n
\x1b[0m\n
2024/02/04 07:20:12 [ 8] DEBUG: (main.DockerComposeDriver.waitForContainersToBeRunning) Containers not yet created: testdojorunid\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.stop) Stopping containers\n
2024/02/04 07:20:12 [ 5] INFO: (main.DockerComposeDriver.stop) Stopping containers with command: \n
docker-compose -f ./test/test-files/itest-dc.yaml -f ./test/test-files/itest-dc.yaml.dojo -p testdojorunid stop\n
Container testdojorunid-abc-1 Stopping\n
Container testdojorunid-abc-1 Stopped\n
2024/02/04 07:20:12 [ 5] DEBUG: (main.DockerComposeDriver.stop) Exit status from stop command: 0
This is not fixed in Dojo 0.12.0. It happens rarely, and the workaround implemented in Dojo 0.10.3 is still in place. The workaround was that we don't panic
but rather print out a log message instead. However, this leads to flaky tests.