omarryhan/aiogoogle

ValidationError false positives

Closed this issue ยท 13 comments

Hi,
If I try to run the download_drive_file.py example (modified to use the ServiceAccountCreds) I got a ValidationError:

Invalid instance: "alt" media isn't valid. Expected a value that meets the following criteria: Must be one of the following: "json"

Indeed the drive v3 api spec doesn't list "media" in the "alt" parameter:

    "parameters": {
        "alt": {
            "type": "string",
            "description": "Data format for the response.",
            "default": "json",
            "enum": [
                "json"
            ],
            "enumDescriptions": [
                "Responses with Content-Type of application/json"
            ],
            "location": "query"
        },

I need to set validate parameter to False in order to make it work.
Should we add a special validation case just for this enum?
The thing is the "media" value is only listed in method/parameters descriptions, so it's hard to retrieve this info automatically.

Unfortunately, this is a slightly common occurrence with Google APIs. Discovery docs aren't always accurate. Putting an exception for false positives is a bit tedious imo. I'd rather turn the validation module off by default. When I wrote this module, I did it for fun. Didn't know it would fail that often. At this point, I don't know if it's even that useful tbh. If we switch validation to be off by default, Google will return a 400 bad request anyway with a description of what's wrong/missing. What do you think?

Well, client side validation is useful to avoid wasting bandwidth and/or reduce the usage in case of rate-limited APIs.

But since the discovery docs are flawed, and we can't possibly keep up with all the false positives, I'd say that it's ok to defer the validation to Google.

Another, hackier, way, is to keep it enabled and clearly state in the documentation to explicitly disable it when something is wrong. But this is less than ideal for a usability point of view.

But I suppose it depends on how often the validation fails!

You raise some good points.

Come to think of it, this feature is more of an optimization/efficiency feature. Maybe we can include it in the tips and hints section of the docs and then switch it off by default?

Do you think if we switch validation off by default, could that possibly break any of our user's app in some way?

Also, I think another important question is, how does our validation compare with Google's? Which is more helpful? And by how much?

You raise some good points.

Come to think of it, this feature is more of an optimization/efficiency feature. Maybe we can include it in the tips and hints section of the docs and then switch it off by default?

Yeah, this would do it.
It would be nice to have the option to enable it globally without specifying it in every single request.

Do you think if we switch validation off by default, could that possibly break any of our user's app in some way?

Well, with validation enabled you get a ValidationError, while without it, you need to handle a more generic HTTPError.
I don't know if and how to discriminate errors in raise_for_status method.

Also, I think another important question is, how does our validation compare with Google's? Which is more helpful? And by how much?

I can't answer this one ๐Ÿ˜‰ I honestly don't know how good google's 400 messages are...

Yeah, this would do it.
It would be nice to have the option to enable it globally without specifying it in every single request.

You can actually do that, by passing validate=True to aiogoogle.discover.

I can't answer this one ๐Ÿ˜‰ I honestly don't know how good google's 400 messages are...

So I just tested both on this endpoint: https://developers.google.com/youtube/v3/docs/playlists/insert

and inserted the JSON param snippet.title as an integer instead of string

This is the error without validation:

{
  "error": {
    "code": 400,
    "message": "Invalid value at 'resource.snippet.title' (TYPE_STRING), true",
    "errors": [
      {
        "message": "Invalid value at 'resource.snippet.title' (TYPE_STRING), true",
        "reason": "invalid"
      }
    ],
    "status": "INVALID_ARGUMENT"
  }
}

And this is with with

aiogoogle.excs.ValidationError: 

 Invalid instance: "title"

True isn't valid. Expected a value that meets the following criteria: <class 'str'>

You be the judge. I'm open for either options.

Well, the info are there, and we could raise a ValidationError when a response has status 400, using only the message as the exception argument.

Just one thing: the resource part of the parameter path is never specified by the user when composing the request, so it can be confusing.
I don't know if there are other kinds of messages that introduce other "elements of confusion" like this, and if it's something we can leave like this or need to address with some regex...

Well, the info are there, and we could raise a ValidationError when a response has status 400, using only the message as the exception argument.

Hmm, not sure about that one. It's very easy to check if it's a bad request using the HTTP error i.e. by checking e.res.status. If we're sure that 400 errors will only be sent for invalid URLs and Body, then yeah sure. But what if a header is invalid?

Just one thing: the resource part of the parameter path is never specified by the user when composing the request, so it can be confusing.
I don't know if there are other kinds of messages that introduce other "elements of confusion" like this, and if it's something we can leave like this or need to address with some regex...

Sorry, I'm not sure what you mean here.

Let me rephrase my thoughts ๐Ÿ˜‰

This is a client library that aims to hide the HTTP-related stuff from the user, or at least it's what I'm expecting from a REST API wrapper.

So I think that exposing HTTP exceptions to the user goes against the goal of the library.

In the example, the error message talks about resource.snippet.title, but you (the user) only specified snippet.title in the json dict.
While it is good to have the complete parameter path (in contrast to the ValidationError that only states the name of the parameter), that resource word should disappear from an exception thrown by the library, IMHO.

But I understand that we're talking about a massive amount of API endpoints where anything could go wrong, and bubbling up the error could be a fine solution (after all, the user is/should be fully aware that it's all about HTTP requests and responses underneath!)

Ah gotcha.
Imo, the abstractions in this library are not that great too, not just because it doesn't hide the HTTP details, but more importantly because it doesn't hide the details of the creation of the request. In the docs, I walk the users through how a request is made, what a resource is and what a method is. In an ideal world, users shouldn't have to worry about any of this. Users should simply have a GUI generated for all the discovery docs available, and ideally, they can just copy and paste snippets of code generated online, after they browse through what they want. But unfortunately, I have no resources to do any of this. So I decided, instead of having leaky abstractions everywhere, I just teach the user how stuff works under the hood.

I remember first time I tried to use Google's Python client lib, I was so confused, it was a weird API wrapper with no autocomplete and had no idea how to fix bugs when they arose. So I tried not to commit the same mistake with this lib by coming to terms with the fact that I don't have the resources to make a perfect abstraction and teach the user how everything works under the hood. Overall, my main goal with this lib is to make stuff easier for people who already understand how Google APIs work. For casual users, they might have to read the docs.

I might have drifted off topic a little bit, but to address your point, yeah the resource parameter is definitely confusing and a casual user of this lib will probably just ignore it at first sight. Still idk of a better solution. Adding a regex to filter out the resource keyword is a bit of an overkill imo ๐Ÿ˜ƒ

Thanks for sharing your vision, your post would make a great intro in the "Contribute" section of the documentation, to ensure devs are on the same page ๐Ÿ˜‰

In light of this, I agree with you that we can simply disable validation by default and document it, without changing anything else.

As for breaking user apps, I suppose that if someone has an exception handling in place, it already checks for both ValidationError and HttpError, so there would be little change in behavior.
Of course, it could be wise to mark this as a breaking change!

Yup let's do that. I'll be happy to accept a PR if you're up for it. If not I can do it, should be very simple I think.

Just released v3.0.0
Thanks for bringing this up!

Sorry I didn't get the time to help you on that.

As always, many thanks for your work!