Some builds are stuck in the running state

Question

Some builds are stuck in the running state

Closed this issue a year ago · 4 comments

Some builds appear to be stuck in the running state. I briefly looked at the logs and it seems there is more than one cause.

Answer 1 · 2023-10-02T20:35:30.000Z

There was no way for us to switch them back to a pending state, obtain a new builder, and try again. I needed to fail all those builds, so I apologize for any inconvenience if this affected you. Please resubmit your builds if needed.

Answer 2 · 2023-10-03T07:36:40.000Z

I needed to fail all those builds, so I apologize for any inconvenience if this affected you.

Killing the background worker (at least with 9) should indeed restart the build (worker disappears without telling the dispatcher, so the dispatcher starts a new worker instead). If this isn't the actual behavior, it is a bug.

Answer 3 · 2023-10-03T15:16:54.000Z

For all of the builds I did

[root@copr-be ~][PROD]# ps ax |grep 6449298
4095001 ?        Sl    12:08 Builder for task 6449298-mageia-9-x86_64: Job 6449298-mageia-9-x86_64, host info: ResallocHost, ticket_id=4374032, hostname=2620:52:3:1:dead:beef:cafe:c14a (command: /usr/bin/copr-backend-process-build --daemon
 --build-id 6449298 --chroot mageia-9-x86_64 --worker-id rpm_build_worker:6449298-mageia-9-x86_64)

[root@copr-be ~][PROD]# kill 4095001

and they all ended up failed. So, maybe a bug.

Answer 4 · 2023-10-03T22:02:28.000Z

hm, I suppose kill -9 would be better next time (the background worker could actually partly recover from INT and have a chance to mark the build as failed).