[FEATURE REQUEST] Support creating verifier without knowing upfront the list of user pool IDs and client IDs
dax-hurley opened this issue · 15 comments
Question
Many other JWT libraries have a concept of a 'decoder' which lets you preview the claims being made in a JWT before verifying them. In my case, I am building a multi-tenant application with many different user pools for the different tenants. I want to extract the user pool id from the issuer claim in the token and use that to look up information about the tenant the user is making the request for before verifying. As a result of issue #24 it is possible to extract the raw JWT token by running a verification that intentionally throws and then extracting the value from the error, but this doesn't make for very efficient, clean or readable code. I am currently using a secondary JWT library for this purpose while taking advantage of this one for the Cognito niceties, but I would like to reduce my bundle size by eliminating my other library.
Thanks for opening the discussion!
Our thinking has been to not add this, because we want to prevent users from accidentally using that decode
function where they should actually intend to call verify
. Not all users are JWT experts, or even that great in the English language, it's easy to become confused. (This infamously happened with a very well known JWT verification lib out there)
We could maybe call it unsafe_decode
then or so, to make it really obvious. But still a counter point is, that it is simple enough to just do it in plain JS:
// NodeJS
const decodedPayload = JSON.parse(Buffer.from(jwt.split('.')[1], 'base64url').toString());
// Web
const decodedPayload = JSON.parse(atob(jwt.split('.')[1].replace('-','+').replace('_','/')));
That's why we're on the fence on this one, leaning at this point to not add this to the library.
Appreciate to hear your counter arguments though.
Also, would be good to understand your user case. Why do you want this, what is your scenario that requires decoding before verifying? (Especially, since unverified JWTs cannot be trusted).
Thank you for the code snippets!
The application I am building is multi-tenanted with a 1 to 1 relationship between tenants and Cognito user pools (so each tenant has their own unique and isolated user pool). Basically, I want to extract the user pool id and client id from the token to use to lookup which tenant the user is trying to make a request for and then, after verifying that the user pool id and client id do in fact correspond to a tenant in the application, verify the token against that user pool. This will allow the tenancy claim to be directly tied to the user's authentication token as opposed to making it a separate part of the request.
The broader use case would be situations where you need to refer to values in the token for application logic before verification takes place, most commonly to decide how the token will be verified. In my architecture, I don't know the user pool id or client id ahead of time, so I need use the information in the token to figure that out before I'm even able to verify, I think there are other similar scenarios where this sort of functionality may be needed. I definitely agree that if you decide to implement a decoder that you should go out of your way to tell the user that the information within the unverified token cannot be trusted.
Thank you for the code snippets!
You're very welcome.
Thanks for the info, have a couple of follow up questions:
- Is it an option for you to create the JWT verifier with multiple user pools: https://github.com/awslabs/aws-jwt-verify#trusting-multiple-user-pools Would that solve your issue too?
- What is the ballpark figure for the nr of tenants, and growth rate, you want to support?
- Do you even know the list of User Pool IDs you want to check against when the JWT comes by, or do you need to look that up dynamically (from e.g. a database table somewhere)? The
CognitoJwtVerifier
class does assume you know the (list of) User Pool ID, otherwise you can't instantiate it.
If you want to pursue this discussion, it would be great to see more of your business logic, code snippets if possible, so we can really think along with you.
Is it an option for you to create the JWT verifier with multiple user pools: https://github.com/awslabs/aws-jwt-verify#trusting-multiple-user-pools Would that solve your issue too?
Unfortunately not as it would require knowing the user pool ids ahead of time in the code and would not scale to many tenants.
What is the ballpark figure for the nr of tenants, and growth rate, you want to support?
I would like to eventually be able to scale up to hundreds of tenants, although I expect only a dozen or so initially. I am also trying to avoid a situation where my tenants are defined by information in the repo as I prefer them to be defined at a persistence layer level so that managing tenants does not require a developer. I realize that there is a soft limit of 1,000 user pools per AWS account and a hard limit of 10,000. If I get to the point where I am nearly exhausting that hard limit that's what's known as a good problem to have, it is unlikely to happen for quiet a long time, if ever.
Do you even know the list of User Pool IDs you want to check against when the JWT comes by, or do you need to look that up dynamically (from e.g. a database table somewhere)? The CognitoJwtVerifier class does assume you know the (list of) User Pool ID, otherwise you can't instantiate it.
I need to look the user pool ids up dynamically, hence I instantiate a new CognitoJwtVerifier on every request, which is probably not ideal in terms of efficiency, but for what its worth I am caching the JWKs in the database and using those cached JWKs for verification, so when I look up the tenant information I also grab the JWK from my DB instead of grabbing it from the public endpoint.
At the moment my authentication logic looks like this (somewhat simplified and still WIP):
async function authenticate(authHeader: string) {
try {
const token = authHeader.split(" ")[1];
// At the moment the decode function uses your code snippet
// It could use something native to the library
const decodedPayload = decode(token);
const issSplit = decodedPayload["iss"].split("/") as string[];
const userPoolId = issSplit[issSplit.length];
const organization = () => {/* lookup organization from DB using userPoolId */}
const matchedRegion = organization.userPoolRegion;
const userPoolClientId = organization.userPoolClientId;
let jwk: any = {};
if (organization.jwk) {
// use the stored JWK
} else {
// Grab the JWK from the public endpoint and save it.
}
const verifier = CognitoJwtVerifier.create({
userPoolId: userPoolId,
tokenUse: "access",
clientId: userPoolClientId,
});
verifier.cacheJwks(jwk);
const verifyResult = await verifier.verify(token);
return {
username: verifyResult.username,
organizationName: organization.organizationName
};
} catch (e) {
// throw unauthorized
}
}
Thank you for your help!
Thanks that's very helpful!
I believe your feature request could then be translated in:
- Support creating verifier without knowing upfront the list of user pool IDs and client IDs
- So, somehow support dynamic lookup of these (with intelligent caching so you only do such a lookup if you see a new
iss
come by that you didn't see before)
I we support that (which by the way seems fair and will help many multi tenant cases) that would save you a bunch of custom code and you would not need to decode
the unverified JWT yourself. Is that right?
Unrelated point about your code: the lookup of the JWKS from your DB could be coded like this too, which would cache the various JWKS in memory:
import { CognitoJwtVerifier } from "aws-jwt-verify";
import { SimpleJwksCache } from "aws-jwt-verify/jwk";
import { JwksNotAvailableInCacheError } from "aws-jwt-verify/error";
class DbCache extends SimpleJwksCache {
async getJwk(jwksUri, decomposedJwt) {
try {
return super.getCachedJwk(jwksUri, decomposedJwt);
} catch (err) {
if (!(err instanceof JwksNotAvailableInCacheError)) {
throw err;
}
const jwksFromDb = await loadJwksFromDatabase(decomposedJwt.payload.iss); // throw an error if you don't recognize the iss
if (jwksFromDb) {
super.addJwks(jwksUri, jwksFromDb);
return super.getCachedJwk(jwksUri, decomposedJwt);
}
// This will now fetch from the public endpoint:
return super.getJwk(jwksUri, decomposedJwt);
}
}
}
// Create at this scope to reuse across calls to authenticate:
const jwksCache = new DbCache();
async function authenticate(authHeader: string) {
try {
// ...
const verifier = CognitoJwtVerifier.create(
{
userPoolId: userPoolId,
tokenUse: "access",
clientId: userPoolClientId,
},
{
jwksCache,
}
);
} catch (e) {
// throw unauthorized
}
}
I we support that (which by the way seems fair and will help many multi tenant cases) that would save you a bunch of custom code and you would not need to decode the unverified JWT yourself. Is that right?
That is correct!
One thing I do want to point out though is that you of course don't want to allow the user to simply trust whatever user pool id is passed in the JWT without also having some mechanism to verify that the user pool itself can be trusted. Otherwise an attacker could simply create a new user pool in their own AWS account and pass a valid token from that. One option would be to specify that user pool belongs to an account accessible from the AWS SDK, but this would require the AWS SDK as an additional dependency, which isn't ideal. Another option might be be to pass a callback to create()
which would be responsible for determining if the user pool can be trusted. For example, it could look like this:
const verifier = CognitoJwtVerifier.create(
{
tokenUse: "access",
userPoolVerifyCallback: (userPoolId: string, clientId: string) => {
// logic to check against database or other mechanism to check if user pool and client id are valid
if(/* valid */) {
return true
}
else {
return false
}
}
},
);
`
One thing I do want to point out though is that you of course don't want to allow the user to simply trust whatever user pool id is passed in the JWT without also having some mechanism to verify that the user pool itself can be trusted.
Yes, totally agree.
Thanks for all the information. I think it's pretty clear what is needed, we'll sync with the team and see when we'd get to this.
I've been looking at this while thinking about the Amazon Cognito Developer Guide Multi-tenant application best practices
You have four ways to secure multi-tenant applications: user pools, application clients, groups, or custom attributes.
I look at other options in CognitoVerifyProperties
like clientId
and groups
and wonder how to best dynamically pass in these values in addition to userPoolId
since each might require a similar lookup based on some tenant id.
Does it make more sense to have additional constructor properties and some callback on CognitoJwtVerifier
, or to create a new MultiTenantCognitoJwtVerifier
and either require that customJwtCheck
function or have a similar function which returns the important CognitoVerifyProperties
and errors if userPoolId
is not part of that return value set?
or have a similar function which returns the important CognitoVerifyProperties and errors if userPoolId is not part of that return value set?
I like that idea.
So instead of providing the config beforehand, one could provide an async function that returns the verify properties (userPoolId, clientId, etc). We'd call that function then, and await it's result, on every verify call. (Won't work for verifySync)
In our implementation we store these properties in issuersConfig
that we access synchronously in various places. Would have to change that then to async calls (not hard, but is a pervasive change). CTRL-F for this.issuersConfig
here: https://github.com/awslabs/aws-jwt-verify/blob/main/src/jwt-rsa.ts
FYI turns our we do have a decode-like function, be it a little hidden:
import { decomposeJwt } from "aws-jwt-verify/jwt";
// danger! you're looking at an unverified JWT, trust nothing in it!
const { payload } = decomposeJwt("ey....");
Ideally you would not need to call this yourself though (danger! you're looking at an unverified JWT, trust nothing in it!), so the above discussion and proposed solution direction still stands.
Now properly documented how you can decompose (/decode) an unverified JWT: https://github.com/awslabs/aws-jwt-verify#peeking-inside-unverified-jwts
I was reading this issue/request and this use case is exactly the same that I want to resolve on my app, do you have an update about this? depending on that I can think about waiting for your solution or implement a temporary solution in the meantime
Thanks in advance!
The current best way to do it is:
0 Create JWKS cache (once)
Create the JWKS cache at a higher scope so that you can reuse it across requests:
import { SimpleJwksCache } from "aws-jwt-verify/jwk";
const jwksCache = new SimpleJwksCache();
1 Decompose Unverified JWT
Your JWT probably has a custom attribute to identify the tenant, e.g. tenant_id
:
import { decomposeJwt } from "aws-jwt-verify/jwt";
// danger! you're looking at an unverified JWT, trust nothing in it!
// only use it to establish e.g. the tenant id to use for further verification
const { payload: { tenant_id } } = decomposeJwt("ey....");
2 Dynamic lookup of UserPool ID, Client ID
Let's say you have a DynamoDB table where you store User Pool ID and Client ID for each tenant, and that the JWT payload
has a custom attribute tenant_id
. Then lookup the User Pool ID and Client ID for that tenant ID.
const { userPoolId, clientId } = await myLookUpFunction({ tenantId: payload.tenant_id });
Ideally, you would add some in-memory caching here, much like SimpleJwksCache
does, so that when subsequent JWTs for the same tenant_id
come in, you can use the cached userPoolId
and clientId
instead of having to call the (more expensive) myLookUpFunction
4 Create verifier, use jwksCache from upper scope, and verify JWT
CognitoJwtVerifier.create({
userPoolId: "<user_pool_id>",
tokenUse: "id", // or "access"
clientId: "<client_id>",
}, {
jwksCache // use cache from upper scope, instead of creating a new one
}).verify("ey....")
Because you create a new verifier for each verification, it makes sense to explicitly create the JWKS cache at the upper scope (step 0), so that the cache for JWKS downloads is kept and reused for multiple verifications. Creating a verifier is very quick (synchronous), it's the downloading of JWKS that you want to make sure is cached.
--
Hope that helps!