cloudfoundry/service-fabrik-broker

Interoperator should return CR validation failures as Error 400

Closed this issue · 12 comments

Is your feature request related to a problem? Please describe.
Hi folks! We're using the interoperator as an OSB-facade to deploy our custom CRs.

While it's working nicely in general (thanks for the work on the project!), we hit one issue which could be improved:

If the interoperator encounters a k8s validation error when creating the 3rd party CR, it will retry a couple of times and then fail, causing the following (rather technical) error response to a CF user:

status:    create failed
message:   Service Broker Error, status code: ETIMEDOUT, error code: 10008

Describe the solution you'd like
K8S CRDs offer a rich set to implement validation rules - either OpenAPI spec validation or validating webhooks. This is the default way to validate input CRs.

The OSBAPI spec allows to return an Error 400 during service provisioning:
https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md#response-3

This is often used for validation, especially needed when the end user has to specify creation parameters. This validation is run as early as possible - in CF's case, an Error 400 would immediately return the validation error to the user when he submitted the cf create-service command.

We think that would be the right point to run the k8s CR creation (and pass through error messages) - this way, validation rules could be reused.

What do you think?

Describe alternatives you've considered
For now, we cannot implement our validation rules in k8s as the end user will not have a meaningful error description for them; even more critical, we have to make sure that validation basically always passes and move all the validation logic in the reconcile loop (i.e. transition to a failed state for which we can return a meaningful error description to the end user).

I think this could be a bigger issue especially for 3rd party operators where you don't control the code/validation rules.

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177725502

The labels on this github issue will be updated when the story is started.

@MatthiasWinzeler Thanks for sharing the issue details. We tried to reproduce the same by modifying the "apiVersion" tag to "api" in the sfplan, causing the resource creation to fail with the error mentioned above. With the fix in place, the below error response will be returned:

"status": " create failed",
"message": " Object 'apiVersion' is missing in 'unstructured object has no version',error code: 10008"

Let us know the feedback.
Also it will be helpful if you could share some failure scenarios that can be reproduced for validating the changes.

@Pooja-08 thanks for getting back to us. Yes, something along these lines would be desirable.

Since the idea is to pass these errors back to the end user, it only makes sense to add the feature if the message is understandable by the end user. more specifically, it should not contain too many technical internals.

example of a validation error:

error validating data: ValidationError(MyObject.spec): missing required field "myField" in com.project.v1alpha1.MyObject.spec

In this case, it does not make sense to pass the whole message back to the end user. However, missing required field "myField" would be understandable by the end user. I'm not sure though it's possible to get only that information - that depends on the format which the kube API uses for validation errors.

Does that make sense? Let me know if you need some further help with investigating (we can also pair on that if you wish).

@MatthiasWinzeler The error message "Object 'apiVersion' is missing in 'unstructured object has no version" is what we receive from the apiserver. It wouldn't be feasible to parse and modify the error message as you've mentioned. Also, the "reason" for this error is empty unlike "NotFound" Error where the reason has "NotFound" value as shown in the below error snippets:

Validation error as received from the apiserver:
ERROR resources.internal Encountered error {"reason": "", "error": "Object 'apiVersion' is missing in 'unstructured object has no version'"}

NotFound error:
ERROR resources.internal Encountered error {"reason": "NotFound", "error": "postgresqls.acid.zalan.do "sapcp-12ca2362b-6561-4673-ad24-111d7ac86cf" not found"}

because of this, we are not able to filter the validation errors and return customized message.

Any input/suggestion will be appreciated.

@Pooja-08 Thanks for the answer :)

What struct/error do you get back exactly? I am not 100% sure, but I think I saw k8s apiserver exposing this struct on validation errors:
https://github.com/kubernetes/apimachinery/blob/master/pkg/api/errors/errors.go#L242

That one would allow exposing the necessary information rather easily.

The exact error is this:
"error": "Object 'apiVersion' is missing in 'unstructured object has no version'"

If I print the reason using apiErrors.ReasonForError(err) it returns empty:
"reason": ""

Is there a way to print the error struct as defined in errors.go? I tried %+v option but it returns just this message.

@Pooja-08 The easiest way for me is always to inspect the variables in an IDE debugger. for example this error:

Screenshot 2021-06-29 at 16 22 17

It appears to me that it's indeed a k8s.io/apimachinery/pkg/api/errors.StatusError. You should be able to cast it with something like this and then get the relevant fields from the Details.Causes struct.

		err := k8sClient.Create(ctx, privateEndpoint)
		if statusError, ok := err.(*errors.StatusError); ok {
			if statusError.ErrStatus.Code == 422 {
                             fmt.Printf("+%v", statusError.ErrStatus.Details.Causes)
			}
		}

I think joining the Causes[*].Message would lead to a nice user-facing message.

@MatthiasWinzeler Thanks for the details. All this while I was hitting error in the

k8sClient.Get(ctx, namespace, resource)

and that's the reason why I was not able to filter the error based on status. After looking at this stack, I reproduced the error in Create() step and now I do see the StatusError. The error message is now modified to show the cause message. Snippet of the modified code:

              if statusError, ok := inputErr.(*apiErrors.StatusError); ok {
			if statusError.ErrStatus.Code == 422 {
                                 status.description = fmt.Sprintf("%s, Error code: 422", statusError.ErrStatus.Details.Causes[0].Message)
                         }
              }

If for any reason the message is empty, we will show the below default message:

                    "Status Error, Error code: 422"

Let me know if this looks good.

@Pooja-08 Cool, that's already very nice!

I think what's important:

  • There might be multiple Causes, for example if you have validation errors on multiple fields. So it's important to show all of those, for example by joining the Causes[*].Messages with a ,
  • What's also important is to make sure it also works with validation errors coming from validation webhooks. I'm not sure they have the same format, that would need to be tested. This would allow operator CRs to use custom validation webhooks that are directly passed back to the end users.
  • For a fallback message I suggest to have something that helps the end user a bit more (users probably can't make much sense of Status Error, Error code: 422). Maybe something along the lines of Unprocessable Entity - this is usually caused by invalid request parameters, Error code: 422.

Hi @MatthiasWinzeler
It might take a little longer than what we expected for completing the requested enhancement. Therefore, we will be able to resume this topic based on priorities and available resources.

Partially implemented fixed in #1345

Closing as per comment above. Please file another issue if more functionality is required.