eclipse-bluechi/bluechi

integration test for stopping target service of proxy feature broken

Closed this issue · 2 comments

Describe the bug

It seems that the proxy-service-stop-service feature is broken in the GH CI:
https://github.com/eclipse-bluechi/bluechi/actions/runs/8138365284/job/22240289714?pr=770#step:8:175

Running tests locally in the contianer setup doesn't lead to a failure.

It receives a timeout while waiting for the requesting.service to start (which resolves the proxy dependency).

Note:
Its currently a bit hard to debug since no journal logs and other artifacts are collected due to the pytest timeout (which can't be caught and its not intended to do so, apparently). Therefore, a small and simple custom implementation of a signal-based timeout might be better in the future.

To Reproduce

Running integration tests in the CI

Expected behavior

Test passes

It seems there are logs about too many open files:

11:18:34                 out: 2024-03-04 11:18:34+0000,097 DEBUG   [bluechi_test.test] Stopping all BlueChi components in all container... (test:99)                                                                                                          
11:18:34                 out: 2024-03-04 11:18:34+0000,148 DEBUG   [bluechi_test.client] Executed command 'systemctl stop bluechi-agent' with result '0' and output 'b''' (client:84)                                                                         
11:18:34                 out: 2024-03-04 11:18:34+0000,222 DEBUG   [bluechi_test.client] Executed command 'systemctl show --property="Result" bluechi-agent' with result '0' and output 'Result=success' (client:84)                                          
11:18:34                 out: 2024-03-04 11:18:34+0000,293 DEBUG   [bluechi_test.client] Executed command 'systemctl stop bluechi-agent' with result '0' and output 'Failed to allocate directory watch: Too many open files' (client:84)                     
11:18:34                 out: 2024-03-04 11:18:34+0000,368 DEBUG   [bluechi_test.client] Executed command 'systemctl show --property="Result" bluechi-agent' with result '0' and output 'Result=success' (client:84) 

So this might be related to podman and the GH hosts in the CI.

Update:
It seems that if this issue occurs, it'll persist quite a while but eventually disappears after restarting the pipeline.

Should be fixed by #820 feel free to reopen if appears again