kubernetes/kubernetes

Kubelet: Add a metrics in kubelet to track how long it takes for pod to fully start

JeffLuoo opened this issue · 12 comments

What would you like to be added?

Add a new metrics to record the end-to-end startup latency of the pod since pod created to pod ready for the first time. The metrics will include all stages of the pod life cycle like scheduling and image pulling.

Metrics Name: kubelet_pod_full_startup_duration_seconds {namespace=<namespace_name>, pod=<pod_name>, uid=<uid>, node=<node_name>}

Metrics Type: Gauge

Metrics Unit: Seconds

Why is this needed?

Kubelet currently reports a Histogram metric pod_start_total_duration_seconds that gives users overview of the pod end-to-end startup latency from pod creation to pod running. However, pod ready will usually be the signal to say that a pod is ready to serve traffic.

Having the new metric will allow users to track how long it takes for their pods under the workload to fully start and ready to serve traffic, and with the metrics label of node_name, this metric can be a supplementation to the existing metric pod_start_total_duration_seconds if users want to track the node-level pod end-to-end startup latency from creation to ready.

Also, user could aggregate the metric by the workload (Deployment, StatefulSet, and etc.) to present the workload-level pod end-to-end startup latency.

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cc: @ruiwen-zhao for review.

/sig instrumentation

/sig node

Just to bring up previous discussion around metric cardinality, adding both pod name and node name to metric labels might be too much cardinality. We need to come up with a way to address this.

cc @SergeyKanzhelev @logicalhan @dashpole