Kubernetes Jobs restart containers which execution has failed

Question

Kubernetes Jobs restart containers which execution has failed

Closed this issue 8 years ago · 4 comments

This could lead to a potentially infinite number of restarts of a container, for instance, their execution may be stopped because a dependency can not be met anymore (broken link).

A workflow is composed by a set of steps which are mapped to k8s Jobs. In terms of workflows, the expected behaviour is that if a step fails a log is retrieved and the user gets a message saying that there was a problem, specifically in a certain step.

Answer 1 · 2016-10-19T14:45:33.000Z

Open issue on Kubernetes official repository.

Answer 2 · 2016-10-19T14:48:11.000Z

Possible solution related with milestone 1. ~~Use the broker to detect if a job has at least 1 failing execution and kill that job so the "restarting till completed" gets stopped.~~ If used with restart policy OnFailure, mandatory to avoid Jobs creation of several new containers to successfully run the task, there will not be a Pod with a failed execution since it will restart forever.

Answer 3 · 2016-10-25T11:50:24.000Z

A solution could be to launch Pods (containers) directly without relying on Jobs. If this path is taken we should take care (on the Step Broker) of the Pods that could not be launched because, for instance, a resource quota limit has been met.

Answer 4 · 2016-12-02T08:17:46.000Z

https://github.com/diegodelemos/cap-reuse/milestone/1 fixes it. Remove this behaviour once k8s jobs can be configured to avoid exceeding certain number of restarts.