stolostron/multicloud-operators-foundation

ManagedClusterView should preserve original error when retrieving resources

Opened this issue · 3 comments

Problem:
Currently, the ManagedClusterView implementation does not preserve the original error when attempting to retrieve resources from a ManagedCluster. Instead, it converts all errors to a generic error and embeds the error text in the condition message (see here), which can be frustrating and difficult to parse.

For example, when a resource is not found or when there is a connectivity issue, there is nothing that distinguishes between the two cases other than the embedded error message within the .status.condition[0].Message

To address this, we recommend improving the ManagedClusterView implementation to preserve the last error when retrieving resources. Specifically, we suggest adding a new field to the .status object that holds the last error encountered during resource retrieval. Another possibility is to add another condition that holds the error.

we have different reasons for this condition when failed to get resource:
"ResourceNameInvalid"
"ResourceTypeInvalid"
"ResourceGVKInvalid"
"GetResourceFailed"

not sure to add a not found reason can help you?
do you still need the error if the process is recovered from failure?

For the reason GetResourceFailed, there is no distinction between a Not Found error and a Timeout error for example. Both errors end up in the message string.
The Not Found error tells us that the resource does not exist in the managed cluster. Here is an example of not found error

GetResourceFailed failed to get resource with err: volumereplicationgroups.ramendr.openshift.io "busybox-drpc" not found

And here is an example of an inaccessible cluster

GetResourceFailed failed to get resource with err: Get "https://172.30.0.1:443/apis/ramendr.openshift.io/v1alpha1/namespaces/busybox-workloads-3/volumereplicationgroups/busybox-drpc": dial tcp 172.30.0.1:443: i/o timeout

The line that builds the error message -> see here

@BenamarMk how about use the statusReason of error to replace GetResourceFailed ?