kubeflow/katib

Support Kubernetes Sidecars for Katib Metrics Collectors

andreyvelich opened this issue · 4 comments

Recently Kubernetes added native support for sidecars containers as part of KEP 753: kubernetes/enhancements#3761

We need to discuss if we can improve our architecture to run Katib Metrics Collectors as Kubernetes sidecars.
It requires to use initContainer with restartPolicy: Always to run the container as a sidecar.

This feature will be available only in Kubernetes 1.28, but we can start the design discussions now.

Existing problems with Katib Metrics Collectors Sidecar:

  • If training container is finished before Metrics Collector is started, the Trial will fail.
  • Since we override Trial start command, the Trial might fail. E.g. #1914 (comment).

cc @kubeflow/wg-automl-leads @tenzen-y @gaocegege @votti

@andreyvelich Thank you for raising this proposal!
I agree with you because by supporting the sidecars pattern, we can avoid manually managing the termination of the metrics collector.

https://github.com/kubeflow/katib/blob/master/pkg/metricscollector/v1beta1/common/pns.go.

Also, we might be able to support Istio.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

/lifecycle freeze

/lifecycle frozen