openconfig/public

Clarify healthz `unhealthy-count` and `last-unhealthy`

hellt opened this issue · 3 comments

hellt commented

Hi @robshakir @marcushines

Can we clarify the difference in descriptions for last-unhealthy and unhealthy-count?
Namely, last-unhealthy indicates that the field should be modified any time a component is in a status that is not HEALTHY. UNSPECIFIED falls under this category.

At the same time, unhealthy-count only counts transitions HEALTHY<->UNHEALTHY. Will unhealthy counter increment if the component moved from HEALTHY to UNSPECIFIED?

This subtle difference may lead to differences in implementation.

leaf status {
type enumeration {
enum UNSPECIFIED {
description
"The component's health status has not yet been checked
by the system.";
}
enum HEALTHY {
description
"The component is in a HEALTHY state, and is operating
within the expected parameters.";
}
enum UNHEALTHY {
description
"The component is in a unhealthy state, it is not
performing the function expected of it.";
}
}
description
"The status of the component, indicating its current health.";
oc-ext:telemetry-on-change;
}
leaf last-unhealthy {
type oc-types:timeticks64;
description
"The time at which the component as last observed to be unhealthy
represented as nanoseconds since the Unix epoch. Unhealthy is defined
as the component being in a state other than HEALTHY.";
oc-ext:telemetry-on-change;
}
leaf unhealthy-count {
type uint64;
description
"The number of status checks that have determined this component
to be in an unhealthy state. This counter should be incremented
when the component transitions from the HEALTHY to UNHEALTHY
state such that the value reflects the number of times the
component has become unhealthy.";
oc-ext:telemetry-on-change;
}
}

To summarize, our reading of current model digests to the following state transition:

  1. healthy->unspecified | last-unhealthy changes | unhealthy-count doesn't change
  2. unhealthy->unspecified | last-unhealthy changes | unhealthy-count doesn't change
  3. unspecified->unhealthy | last-unhealthy changes | unhealthy-count doesn't change
  4. unspecified->healthy | last-unhealthy doesn't change | unhealthy-count doesn't change
  5. healthy->unhealthy | last-unhealthy changes | unhealthy-count increments change
  6. unhealthy->healthy | last-unhealthy doesn't change | unhealthy-count doesn't change

it should be once healthy it is any state transition out of that state

hellt commented

@marcushines then this addition should be valid - #853. If I understood your comment correctly.

dplore commented

Fixed by #853