Encoding/Decoding Extras is a Pain

Question

Encoding/Decoding Extras is a Pain

jferris opened this issue 8 years ago · 17 comments

We use Yesod's credsExtra field on Creds to provide user information from remote services. This data is well-formed and includes fields that are likely required for the application, such as the user's email address.

After decoding the remote user into a Haskell value, we de-normalize it into text pairs:

credsExtra :: GithubUser -> [(Text, Text)]
credsExtra user =
        [ ("email", githubUserEmailAddress user)
        , ("login", githubUserLogin user)
        , ("avatar_url", githubUserAvatarUrl user)
        , ("access_token", decodeUtf8 $ accessToken token)
        ]
        ++ maybeExtra "name" (githubUserName user)
        ++ maybeExtra "email" (githubUserPublicEmail user)
        ++ maybeExtra "location" (githubUserLocation user)

maybeExtra :: Text -> Maybe Text -> [(Text, Text)]
maybeExtra k (Just v) = [(k, v)]
maybeExtra _ Nothing  = []

We write a function like that for each service.

On the application side, we pairs the text pairs back into a Haskell value:

githubProfile :: [(Text, Text)] -> Either Text Profile
githubProfile extra = Profile
    <$> (lookupExtra "name" extra <|> lookupExtra "login" extra)
    <*> lookupExtra "email" extra

lookupExtra :: Text -> [(Text, b)] -> Either Text b
lookupExtra k extra =
    maybe (Left $ "missing key " ++ k) Right $ lookup k extra

The application needs to write a mapping function like this for each service.

There are several issues with this approach:

It requires a large amount of copy-and-paste boilerplate for each new service we support.
It requires a large amount of boilerplate in the application.
Each text key is a value that can be misspelled and break the application. The application will compile successfully but not work.
If we change keys in any way on our side, we risk breaking users' applications without breaking the build.
The process of discovering which keys are available is poor, because you get an anonymous pair of Text values instead of a well-formed Haskell value.

I'd like to do a larger update that improves this experience.

Answer 1 · 2016-07-12T15:36:35.000Z

The Creds type is only parameterized over the Yesod app type, so I don't think we can inject a user type for each service. However, we could expose functions for doing the decoding into well-formed types:

-- In the library
decodeGithubUser :: Creds m -> Either Text GithubUser

-- In the application
githubProfile :: Creds m -> Either Text Profile
githubProfile creds = do
  u <- decodeGithubUser creds
  return $ Profile (githubUserName u) (githubUserEmail u)

This would eliminate the need for applications to write a text pair lookup function, and would provide compile-time checking that the way they're mapping the remote profile makes sense.

Since we already need to write a decoding function for the remote JSON representation, I was thinking it may make sense to directly set a github_user_json key in credsExtra and then parse it back out in decodeGitHubUser.

Answer 2 · 2016-07-13T14:22:28.000Z

I like the idea of owning credsExtra and providing a decode function like that. Setting a single data key that we decode sounds good too, though there would be some tricks when the data comes from multiple JSON responses.

Do we think it's more common/valuable to have a separate decode function for every plugin or to try and unify the interface into an "OAuth2 User" because so many fields are common across plugins? We could try to map it out to see exactly how the tradeoff goes with the plugins we have today.

Answer 3 · 2016-07-13T14:26:35.000Z

Do we think it's more common/valuable to have a separate decode function for every plugin or to try and unify the interface into an "OAuth2 User" because so many fields are common across plugins?

I can take a look to see what's common, but I think the common data type would be kind of weird. Some services guarantee an email, some don't, and some have multiple emails. Others have service-specific fields like "team name." I think having Maybe Text for a team would be awkward and potentially dangerous, as it could make it easy to fetch a GitHub team name and use it like a Slack team ID.

Maybe something with typeclasses could handle the really common fields like name and email without losing type safety and flexibility.

Answer 4 · 2016-07-13T14:39:37.000Z

Personally I would be interested in using this library for just authorization, so I'm not really interested in extra information. In the cases where I do need some extra info, I would anyway need to make API calls directly to get emails and other information. Therefore, I would prefer if plugins were only required to return the minimum amount of information (some sort of unique id for the user and the OAuth token which can be used for subsequent requests), with things like email being an extra.

At the same time I realize that the library is probably designed with some specific thoughtbot usecase in mind, were things like having an email address available is required. In this case it would be nice to either document what "extra" information is expected, or make it obvious through types.

Answer 5 · 2016-07-13T15:55:25.000Z

I guess the other direction we could go in here would be to drop support for remote profiles entirely and only return the access token in credsExtra. Applications could perform their own requests to fetch whatever information they want, or other libraries could handle the details of how to decode user profiles for various services.

Answer 6 · 2016-08-27T10:21:19.000Z

I'm interested in implementing more plugins, but I think it would be good to clarify this issue first. Some services, e.g. orcid.org, provide an access token AND the unique identifier of the user already in the initial OAuth request. This is enough for handling authentication without any extra requests. To get the email address of the user (which is currently expected to be returned by the plugins) an additional request is necessary (+ parsing some XML).

The way I see plugins could be expected to return

The same information as now (situation unchanged), or
Just the access token, like @jferris suggests above, or
access token + some unique identifier, meaning that the user can be authenticated without any extra calls. Authentication is probably a fairly likely scenario why users are using this library, so this option might make sense.

Any opinions on this?

Answer 7 · 2016-08-30T17:14:49.000Z

After giving this some more thought, I would be in favor of dropping the "extra" stuff we currently put in credsExtra, leaving only the things required to perform additional requests to get whatever information the plugin consumer needs (e.g. the access token). I like the idea of simplifying the scope of this package -- Personally, I'd rather support a tighter set of features for more plugins, than have a cumbersome API causing friction in adding them.

Additionally, we could move the functions for retrieving and decoding profiles to another, related package to make for an easy upgrade path.

@pafcu the orcid.org situation doesn't sound abnormal, a "unique identifier of the user" is already a property of all the plugins and is what should be returned as credsIdent, right?

EDIT: To clarify, I'm in favor of option 3, but with the unique identifier returned as credsIdent not also in credsExtra.

Answer 8 · 2016-08-30T17:34:46.000Z

For most services, you first do the OAuth dance to obtain an access_token, and then use that token to obtain some profile data. This data then usually contains some sort of identifier for the user. This is basically what all the plugins are doing now. In the case of ORCID, the identifier is returned already along with the access_token (see http://members.orcid.org/api/post-oauthtoken), meaning one less request is needed.

For the other services the current way is not much of a problem: since you anyway have to do that extra request to get the identifier, you might as well grab the email, and other extra info too (I presume this is why it is done like it is done). But for ORCID getting the email address does involve some extra work (or rather, you can't get away with doing less than normal).

One problem with option #3, is that in case you really do need e.g. the email of a Github user (probably not an uncommon scenario), you basically have to call the same API endpoint twice: once when you obtain the credsIdent and once when you want to parse the profile data to get the email.

Answer 9 · 2016-08-30T17:42:43.000Z

I see, apologies for not having this all in my head more clearly; it's been a while since I've worked in this code base. I think I'll have to give this some more thought.

Answer 10 · 2016-08-30T17:42:45.000Z

Since we need to perform a follow-up request in order to return a unique identifier for some services, we could do a compromise:

Perform the necessary request (such as GET /user for GitHub)
Parse out only the part of the response we need to get a unique identifier (such as the id JSON property for GitHub)
Add the raw response from that request to credsExtra

That would allow us to get unique identifiers, avoid maintenance for the many fields some providers return, and allow applications to access other parts of the response without performing an additional request.

Answer 11 · 2016-08-30T17:43:47.000Z

You're picturing credsExtra = [("response", "<json as Text>")]?

Answer 12 · 2016-08-30T17:45:32.000Z

Right. I'm not sure if "response" is a good name, and not every service would have JSON, but that's the idea.

Answer 13 · 2016-08-30T17:47:16.000Z

e.g. the ORCID API uses XML, which is why I'm trying to avoid parsing it unless absolutely needed

Answer 14 · 2016-08-30T17:50:08.000Z

Works for me. Do you see us providing useful functions for handling that response in a separate module here, a separate package we maintain, or just letting consumers / other libraries deal with it?

Answer 15 · 2016-08-30T17:52:29.000Z

I think I would just let consumers deal with it, because:

Providers constantly add fields and we'd need to keep up with every provider
When doing strict parsing and non-Maybe fields, we'll start failing entirely if a provider ever removes a field or changes its type (Slack does this constantly) even if the consumer never uses the field that changed

Answer 16 · 2016-08-30T17:56:29.000Z

Since this module is about OAuth2 it seems like parsing profile information is a bit out of scope (and not really related to Yesod at all), so it might make sense to split that out into a separate package.

"Full binding" for any of the services would probably be quite a bit of work, but it might be nice to have a package that at least handles basic profile stuff (such as email, full name, username, avatar, etc) that most of these services do provide.

Answer 17 · 2018-02-12T18:47:22.000Z

Closed by #100 🎉