SUSE/BCI-tests

Some golang tests get stuck after reruns

Closed this issue · 5 comments

A bunch of tests in internal openQA running go tests get stuck and timeout.
https://openqa.suse.de/tests/8731168#step/bci_test/25
https://openqa.suse.de/tests/8731347#step/bci_test/25
https://openqa.suse.de/tests/8731787#step/bci_test/25
https://openqa.suse.de/tests/8731789#step/bci_test/25
https://openqa.suse.de/tests/8731793#step/bci_test/25

Apparently, it only happens using docker.

I have tried to reproduce the issue, but it's randomly happening.

So, I have done:

export TOX_PARALLEL_NO_SPINNER=1
export CONTAINER_RUNTIME=docker
export OS_VERSION=15.3 
export TARGET=ibs-cr 

tox -e go -- -n auto -k bci/golang_1.17 --reruns 3 --reruns-delay 10
go installed: attrs==21.4.0,dataclasses==0.8,execnet==1.9.0,filelock==3.4.1,importlib-metadata==4.8.3,iniconfig==1.1.1,packaging==21.3,pluggy==1.0.0,py==1.11.0,pyparsing==3.0.9,pytest==7.0.1,pytest-container @ git+https://github.com/dcermak/pytest_container@e3381ba64e0ce4ce268e21c68d886dcd7bc0560f,pytest-forked==1.4.0,pytest-rerunfailures==10.2,pytest-testinfra==6.7.0,pytest-xdist==2.5.0,tomli==1.2.3,typing_extensions==4.1.1,zipp==3.6.0
go run-test-pre: PYTHONHASHSEED='700224569'
go run-test: commands[0] | pytest -vv tests/test_go.py --junitxml=/root/BCI-tests/junit_go.xml -n auto -k bci/golang_1.17 --reruns 3 --reruns-delay 10
=========================================================================================================================== test session starts ============================================================================================================================
platform linux -- Python 3.6.12, pytest-7.0.1, pluggy-1.0.0 -- /root/BCI-tests/.tox/go/bin/python
cachedir: .tox/go/.pytest_cache
rootdir: /root/BCI-tests, configfile: pyproject.toml
plugins: testinfra-6.7.0, forked-1.4.0, xdist-2.5.0, rerunfailures-10.2, container-0.0.2
[gw0] linux Python 3.6.12 cwd: /root/BCI-tests
[gw1] linux Python 3.6.12 cwd: /root/BCI-tests
[gw0] Python 3.6.12 (default, Nov 25 2020, 20:33:10) [GCC]
[gw1] Python 3.6.12 (default, Nov 25 2020, 20:33:10) [GCC]
gw0 [5] / gw1 [5]
scheduling tests via LoadScheduling

tests/test_go.py::test_base_PATH_present[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-bci/bci-base:15.3 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base:15.3] 
tests/test_go.py::test_go_size[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
[gw1] [ 20%] PASSED tests/test_go.py::test_go_size[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
tests/test_go.py::test_build_kured[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-kured] 
[gw0] [ 40%] PASSED tests/test_go.py::test_base_PATH_present[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-bci/bci-base:15.3 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base:15.3] 
tests/test_go.py::test_go_version[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
[gw0] [ 60%] PASSED tests/test_go.py::test_go_version[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
[gw1] [ 80%] RERUN tests/test_go.py::test_build_kured[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-kured] 
tests/test_go.py::test_build_kured[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-kured] 

And it gets stuck here.

On another console I ca see:

localhost:~ # docker images
REPOSITORY                                                              TAG                 IMAGE ID            CREATED             SIZE
<none>                                                                  <none>              c6d7611af7f0        10 minutes ago      992MB
<none>                                                                  <none>              0f773570f87d        10 minutes ago      923MB
registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang     1.17                9b58c2d6f790        13 hours ago        986MB
registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang     1.16                d6701832443b        16 hours ago        917MB
registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base   15.3                9752b59e1b92        16 hours ago        116MB

localhost:~ # docker ps --all
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
179f9cdc7d57        9b58c2d6f790        "/bin/bash"         About a minute ago   Up About a minute                       sweet_davinci

but docker logs sweet_davinci doesn't return any output.

In this case it did a re-run of test_build_kured. But after a few tries, it stucks somewhere else, normally always at 80% of execution. Another example:

tox -e go -- -n auto -k bci/golang_1.17 --reruns 3 --reruns-delay 10 --pytest-container-log-level DEBUG
go installed: attrs==21.4.0,dataclasses==0.8,execnet==1.9.0,filelock==3.4.1,importlib-metadata==4.8.3,iniconfig==1.1.1,packaging==21.3,pluggy==1.0.0,py==1.11.0,pyparsing==3.0.9,pytest==7.0.1,pytest-container @ git+https://github.com/dcermak/pytest_container@e3381ba64e0ce4ce268e21c68d886dcd7bc0560f,pytest-forked==1.4.0,pytest-rerunfailures==10.2,pytest-testinfra==6.7.0,pytest-xdist==2.5.0,tomli==1.2.3,typing_extensions==4.1.1,zipp==3.6.0
go run-test-pre: PYTHONHASHSEED='3998865324'
go run-test: commands[0] | pytest -vv tests/test_go.py --junitxml=/root/BCI-tests/junit_go.xml -n auto -k bci/golang_1.17 --reruns 3 --reruns-delay 10 --pytest-container-log-level DEBUG
=========================================================================================================================== test session starts ============================================================================================================================
platform linux -- Python 3.6.12, pytest-7.0.1, pluggy-1.0.0 -- /root/BCI-tests/.tox/go/bin/python
cachedir: .tox/go/.pytest_cache
rootdir: /root/BCI-tests, configfile: pyproject.toml
plugins: testinfra-6.7.0, forked-1.4.0, xdist-2.5.0, rerunfailures-10.2, container-0.0.2
[gw0] linux Python 3.6.12 cwd: /root/BCI-tests
[gw1] linux Python 3.6.12 cwd: /root/BCI-tests
[gw0] Python 3.6.12 (default, Nov 25 2020, 20:33:10) [GCC]
[gw1] Python 3.6.12 (default, Nov 25 2020, 20:33:10) [GCC]
gw0 [5] / gw1 [5]
scheduling tests via LoadScheduling

tests/test_go.py::test_base_PATH_present[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-bci/bci-base:15.3 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base:15.3] 
tests/test_go.py::test_go_size[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
[gw1] [ 20%] PASSED tests/test_go.py::test_go_size[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
tests/test_go.py::test_build_kured[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-kured] 
[gw0] [ 40%] RERUN tests/test_go.py::test_base_PATH_present[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-bci/bci-base:15.3 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base:15.3] 
tests/test_go.py::test_base_PATH_present[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-bci/bci-base:15.3 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/bci-base:15.3] 
[gw1] [ 60%] PASSED tests/test_go.py::test_build_kured[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17-kured] 
tests/test_go.py::test_go_get_binary_in_path[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 
[gw1] [ 80%] PASSED tests/test_go.py::test_go_get_binary_in_path[bci/golang:1.17 from registry.suse.de/suse/sle-15-sp3/update/cr/totest/images/bci/golang:1.17] 

The used https://github.com/weaveworks/kured project is rather big and heavy on additional resources to be pulled. Perhaps we can resolve this issue by switching to a smaller project and building that one.

We could use my good old (TM) https://github.com/grisu48/pasta project, which comes with only two dependencies and is very simple, yet representative for a go project.

Comparison of the runtimes:

leap:/kured # time make cmd/kured/kured
# 91 dependencies

real	0m30.898s
user	2m38.625s
sys	0m15.588s

leap:/pasta # time make
# 2 dependencies

real	0m1.383s
user	0m2.209s
sys	0m0.425s

@dcermak @jlausuch wdyt?

I just did make cmd/kured/kured, which is what the test does and it took only 20s...

I believe that the 91 dependencies makes it stumble sometimes. But then again, npm tests are also fine, so that might be a red herring.

@jlausuch is this still a recurring issue? I was thinking about switching to a different go project anyway…

No, I think we can close this one. Go tests are fine.