Encoding/Decoding Extras is a Pain
jferris opened this issue · 17 comments
We use Yesod's credsExtra
field on Creds
to provide user information from remote services. This data is well-formed and includes fields that are likely required for the application, such as the user's email address.
After decoding the remote user into a Haskell value, we de-normalize it into text pairs:
credsExtra :: GithubUser -> [(Text, Text)]
credsExtra user =
[ ("email", githubUserEmailAddress user)
, ("login", githubUserLogin user)
, ("avatar_url", githubUserAvatarUrl user)
, ("access_token", decodeUtf8 $ accessToken token)
]
++ maybeExtra "name" (githubUserName user)
++ maybeExtra "email" (githubUserPublicEmail user)
++ maybeExtra "location" (githubUserLocation user)
maybeExtra :: Text -> Maybe Text -> [(Text, Text)]
maybeExtra k (Just v) = [(k, v)]
maybeExtra _ Nothing = []
We write a function like that for each service.
On the application side, we pairs the text pairs back into a Haskell value:
githubProfile :: [(Text, Text)] -> Either Text Profile
githubProfile extra = Profile
<$> (lookupExtra "name" extra <|> lookupExtra "login" extra)
<*> lookupExtra "email" extra
lookupExtra :: Text -> [(Text, b)] -> Either Text b
lookupExtra k extra =
maybe (Left $ "missing key " ++ k) Right $ lookup k extra
The application needs to write a mapping function like this for each service.
There are several issues with this approach:
- It requires a large amount of copy-and-paste boilerplate for each new service we support.
- It requires a large amount of boilerplate in the application.
- Each text key is a value that can be misspelled and break the application. The application will compile successfully but not work.
- If we change keys in any way on our side, we risk breaking users' applications without breaking the build.
- The process of discovering which keys are available is poor, because you get an anonymous pair of
Text
values instead of a well-formed Haskell value.
I'd like to do a larger update that improves this experience.
The Creds
type is only parameterized over the Yesod app type, so I don't think we can inject a user type for each service. However, we could expose functions for doing the decoding into well-formed types:
-- In the library
decodeGithubUser :: Creds m -> Either Text GithubUser
-- In the application
githubProfile :: Creds m -> Either Text Profile
githubProfile creds = do
u <- decodeGithubUser creds
return $ Profile (githubUserName u) (githubUserEmail u)
This would eliminate the need for applications to write a text pair lookup function, and would provide compile-time checking that the way they're mapping the remote profile makes sense.
Since we already need to write a decoding function for the remote JSON representation, I was thinking it may make sense to directly set a github_user_json
key in credsExtra
and then parse it back out in decodeGitHubUser
.
I like the idea of owning credsExtra
and providing a decode function like that. Setting a single data key that we decode sounds good too, though there would be some tricks when the data comes from multiple JSON responses.
Do we think it's more common/valuable to have a separate decode function for every plugin or to try and unify the interface into an "OAuth2 User" because so many fields are common across plugins? We could try to map it out to see exactly how the tradeoff goes with the plugins we have today.
Do we think it's more common/valuable to have a separate decode function for every plugin or to try and unify the interface into an "OAuth2 User" because so many fields are common across plugins?
I can take a look to see what's common, but I think the common data type would be kind of weird. Some services guarantee an email, some don't, and some have multiple emails. Others have service-specific fields like "team name." I think having Maybe Text
for a team would be awkward and potentially dangerous, as it could make it easy to fetch a GitHub team name and use it like a Slack team ID.
Maybe something with typeclasses could handle the really common fields like name and email without losing type safety and flexibility.
Personally I would be interested in using this library for just authorization, so I'm not really interested in extra information. In the cases where I do need some extra info, I would anyway need to make API calls directly to get emails and other information. Therefore, I would prefer if plugins were only required to return the minimum amount of information (some sort of unique id for the user and the OAuth token which can be used for subsequent requests), with things like email being an extra.
At the same time I realize that the library is probably designed with some specific thoughtbot usecase in mind, were things like having an email address available is required. In this case it would be nice to either document what "extra" information is expected, or make it obvious through types.
I guess the other direction we could go in here would be to drop support for remote profiles entirely and only return the access token in credsExtra
. Applications could perform their own requests to fetch whatever information they want, or other libraries could handle the details of how to decode user profiles for various services.
I'm interested in implementing more plugins, but I think it would be good to clarify this issue first. Some services, e.g. orcid.org, provide an access token AND the unique identifier of the user already in the initial OAuth request. This is enough for handling authentication without any extra requests. To get the email address of the user (which is currently expected to be returned by the plugins) an additional request is necessary (+ parsing some XML).
The way I see plugins could be expected to return
- The same information as now (situation unchanged), or
- Just the access token, like @jferris suggests above, or
- access token + some unique identifier, meaning that the user can be authenticated without any extra calls. Authentication is probably a fairly likely scenario why users are using this library, so this option might make sense.
Any opinions on this?
After giving this some more thought, I would be in favor of dropping the "extra" stuff we currently put in credsExtra
, leaving only the things required to perform additional requests to get whatever information the plugin consumer needs (e.g. the access token). I like the idea of simplifying the scope of this package -- Personally, I'd rather support a tighter set of features for more plugins, than have a cumbersome API causing friction in adding them.
Additionally, we could move the functions for retrieving and decoding profiles to another, related package to make for an easy upgrade path.
@pafcu the orcid.org situation doesn't sound abnormal, a "unique identifier of the user" is already a property of all the plugins and is what should be returned as credsIdent
, right?
EDIT: To clarify, I'm in favor of option 3, but with the unique identifier returned as credsIdent
not also in credsExtra
.
For most services, you first do the OAuth dance to obtain an access_token, and then use that token to obtain some profile data. This data then usually contains some sort of identifier for the user. This is basically what all the plugins are doing now. In the case of ORCID, the identifier is returned already along with the access_token (see http://members.orcid.org/api/post-oauthtoken), meaning one less request is needed.
For the other services the current way is not much of a problem: since you anyway have to do that extra request to get the identifier, you might as well grab the email, and other extra info too (I presume this is why it is done like it is done). But for ORCID getting the email address does involve some extra work (or rather, you can't get away with doing less than normal).
One problem with option #3, is that in case you really do need e.g. the email of a Github user (probably not an uncommon scenario), you basically have to call the same API endpoint twice: once when you obtain the credsIdent
and once when you want to parse the profile data to get the email.
I see, apologies for not having this all in my head more clearly; it's been a while since I've worked in this code base. I think I'll have to give this some more thought.
Since we need to perform a follow-up request in order to return a unique identifier for some services, we could do a compromise:
- Perform the necessary request (such as
GET /user
for GitHub) - Parse out only the part of the response we need to get a unique identifier (such as the
id
JSON property for GitHub) - Add the raw response from that request to
credsExtra
That would allow us to get unique identifiers, avoid maintenance for the many fields some providers return, and allow applications to access other parts of the response without performing an additional request.
You're picturing credsExtra = [("response", "<json as Text>")]
?
Right. I'm not sure if "response" is a good name, and not every service would have JSON, but that's the idea.
e.g. the ORCID API uses XML, which is why I'm trying to avoid parsing it unless absolutely needed
Works for me. Do you see us providing useful functions for handling that response in a separate module here, a separate package we maintain, or just letting consumers / other libraries deal with it?
I think I would just let consumers deal with it, because:
- Providers constantly add fields and we'd need to keep up with every provider
- When doing strict parsing and non-
Maybe
fields, we'll start failing entirely if a provider ever removes a field or changes its type (Slack does this constantly) even if the consumer never uses the field that changed
Since this module is about OAuth2 it seems like parsing profile information is a bit out of scope (and not really related to Yesod at all), so it might make sense to split that out into a separate package.
"Full binding" for any of the services would probably be quite a bit of work, but it might be nice to have a package that at least handles basic profile stuff (such as email, full name, username, avatar, etc) that most of these services do provide.