ForegroundDeletion of Jobs is not always enforced before recreation
nstogner opened this issue · 8 comments
What happened:
I noticed this issue when I was testing what would happen if I were to trigger a failure in a Job when multiple .replicatedJobs[]
are specified.
And observed that the JobSet controller did not wait for all Jobs to be deleted before creating a new one.
k get jobs -w -o custom-columns=NAME:.metadata.name,UID:.metadata.uid,CREATION_TIME:.metadata.creationTimestamp
NAME UID CREATION_TIME
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
coordinator-example-workers-0 7ed23398-a07b-4a4d-9f72-a14c082567e9 2024-08-26T16:04:13Z
# NEW workers-0 Job created HERE (notice new UID)
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 fa59cbba-37a4-4152-8d94-af6837f19e9b 2024-08-26T16:04:20Z
coordinator-example-workers-0 11a96b99-1301-4851-a7d1-003fb0c8d773 2024-08-26T16:04:28Z
coordinator-example-workers-0 11a96b99-1301-4851-a7d1-003fb0c8d773 2024-08-26T16:04:28Z
coordinator-example-workers-0 11a96b99-1301-4851-a7d1-003fb0c8d773 2024-08-26T16:04:28Z
coordinator-example-workers-0 11a96b99-1301-4851-a7d1-003fb0c8d773 2024-08-26T16:04:28Z
# Notice that driver-0 still exists (notice same UID)
coordinator-example-driver-0 464b7733-9a37-4847-ac12-26be2468bf25 2024-08-26T16:04:13Z
What you expected to happen:
I expected all Jobs from a given attempt to be fully deleted with ForegroundDeletion before any new Jobs are recreated.
How to reproduce it (as minimally and precisely as possible):
I used this manifest:
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
name: coordinator-example
spec:
# label and annotate jobs and pods with stable network endpoint of the designated
# coordinator pod:
# jobset.sigs.k8s.io/coordinator=coordinator-example-driver-0-0.coordinator-example
failurePolicy:
maxRestarts: 2
coordinator:
replicatedJob: driver
jobIndex: 0
podIndex: 0
replicatedJobs:
- name: workers
template:
spec:
parallelism: 4
completions: 4
backoffLimit: 0
template:
spec:
containers:
- name: sleep
image: busybox
command:
- bash
args:
- "-c"
- "sleep 500 && exit 1"
- name: driver
template:
spec:
parallelism: 1
completions: 1
backoffLimit: 0
template:
spec:
containers:
- name: sleep
image: busybox
command:
- sleep
args:
- 100s
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
- JobSet version (use
git describe --tags --dirty --always
):v0.6.0
- Cloud provider or hardware configuration:
kind
(withpodman
) - Install tools:
kubectl apply
- Others: I was also chewing a piece of gum while I was running the commands above.
Assuming the culprit is another call to Reconcile()
that then notices that .metadata.deletionTimestamp
is already set and skips the blocking call to foreground delete the given Job:
jobset/pkg/controllers/jobset_controller.go
Line 560 in 2dcc751
I think the fix would be to just remove this conditional.
Hey! Would you be open to creating an e2e test and see if your suggestion fixes it?
Actually reading about Foreground deletion, it seems that it is not a blocking call to delete. It sets a finalized and then JobSet would recreate.
So I’m not sure this is a bug tbh.
Sounds like you want to only recreate the jobs once they are fully deleted. Jobs can take a long time to be deleted especially id the pods have graceful termination.
I think this is really a feature request.
From the GH Issue that specified foreground deletion it appears that the expected behavior is to block until all Pods are gone (otherwise bad things): #392
Hey! Would you be open to creating an e2e test and see if your suggestion fixes it?
Yes, I should be able to do that
@ahg-g WDYT of #665 (comment)?
Reading the documentation on ForegroundDeletion, I don't understand how that becomes a blocking delete call.
Foreground will keep the child job around with deletionTimestamp set until all its pods are deleted. This will prevent JobSet from creating a replacement for the child job until all the pods are deleted. So this is working as expected.
@nstogner why do you want jobset to wait until all child jobs are deleted before creating their replacements?