fedora-copr/copr

Some builds are stuck in the running state

Closed this issue · 4 comments

Some builds appear to be stuck in the running state. I briefly looked at the logs and it seems there is more than one cause.

Screenshot_2023-10-02_12-27-04

There was no way for us to switch them back to a pending state, obtain a new builder, and try again. I needed to fail all those builds, so I apologize for any inconvenience if this affected you. Please resubmit your builds if needed.

I needed to fail all those builds, so I apologize for any inconvenience if this affected you.

Killing the background worker (at least with 9) should indeed restart the build (worker disappears without telling the dispatcher, so the dispatcher starts a new worker instead). If this isn't the actual behavior, it is a bug.

For all of the builds I did

[root@copr-be ~][PROD]# ps ax |grep 6449298
4095001 ?        Sl    12:08 Builder for task 6449298-mageia-9-x86_64: Job 6449298-mageia-9-x86_64, host info: ResallocHost, ticket_id=4374032, hostname=2620:52:3:1:dead:beef:cafe:c14a (command: /usr/bin/copr-backend-process-build --daemon
 --build-id 6449298 --chroot mageia-9-x86_64 --worker-id rpm_build_worker:6449298-mageia-9-x86_64)

[root@copr-be ~][PROD]# kill 4095001

and they all ended up failed. So, maybe a bug.

hm, I suppose kill -9 would be better next time (the background worker could actually partly recover from INT and have a chance to mark the build as failed).