gardener/etcd-druid

[Feature] Improve logging or condition messages when `etcd-druid` is not able to determine etcd backup status

Closed this issue · 1 comments

Feature (What you would like to be added):
More details when etcd-druid is not able to determine an etcd's backup status should be reported either in the logs or in the conddtion.

Motivation (Why is this needed?):
Currently, the following could run into an error when trying to retrieve the full snapshot lease or the delta snapshot lease:

fullSnapErr = a.cl.Get(ctx, types.NamespacedName{Name: getFullSnapLeaseName(&etcd), Namespace: etcd.Namespace}, fullSnapLease)
deltaSnapErr = a.cl.Get(ctx, types.NamespacedName{Name: getDeltaSnapLeaseName(&etcd), Namespace: etcd.Namespace}, deltaSnapLease)
// Set status to Unknown if errors in fetching snapshot leases or lease never renewed
if fullSnapErr != nil || deltaSnapErr != nil || (fullSnapLease.Spec.RenewTime == nil && deltaSnapLease.Spec.RenewTime == nil) {
return result
}

However, the errors are not logged anywhere and no description is added to the BackupReady condition making the determining the reason for the Unkown status of the condition a bit hard to determine.
There are other cases where the condition could be Unknown and the only time the reason for that is logged is here:
logger.Error(err, "unable to compute full snapshot duration from full snapshot schedule", "fullSnapshotSchedule", *etcd.Spec.Backup.FullSnapshotSchedule)

I think that it would be useful to either log the reason for the Unknown condition in all cases or add it to the message of the BackupReady condition.

Approach/Hint to the implement solution (optional):

Closing this as we agreed with @shreyas-s-rao that it will be tackled as part of #618 and #645