oktadev/auth0-java-reactive-examples

Why are the reactive versions of Quarkus and Helidon so slow to start?

mraible opened this issue · 13 comments

Quarkus Reactive takes 7x longer to start: 353ms vs 49.4.
Helidon SE takes 5x longer: 303.6ms vs 59.4.

Startup time

Imperative

M: (36 + 48 + 36 + 47 + 42) / 5 = 41.8
Q: (55 + 46 + 44 + 54 + 48) / 5 = 49.4
S: (67 + 70 + 65 + 66 + 54) / 5 = 64.4
H: (64 + 55 + 61 + 58 + 59) / 5 = 59.4

Reactive

M: (52 + 45 + 47 + 46 + 47) / 5 = 47.4
Q: (359 + 291 + 276 + 534 + 305) / 5 = 353
S: (55 + 72 + 62 + 52 + 52) / 5 = 58.6
H: (239 + 239 + 509 + 263 + 268) / 5 = 303.6

Imperative with Virtual Threads

Q: (44 + 38 + 46 + 43 + 45) / 5 = 43.2
S: (71 + 64 + 68 + 71 + 66) / 5 = 68

Steps to reproduce

Install GraalVM 21:

sdk install java 21.0.2-graalce

For imperative:

git clone https://github.com/oktadev/auth0-java-rest-api-examples
cd auth0-java-rest-api-examples
./build.sh
./start.sh

The last script takes an argument of micronaut, quarkus, spring-boot, or helidon.

For reactive:

git clone https://github.com/oktadev/auth0-java-reactive-examples
cd auth0-java-reactive-examples
./build.sh
./start.sh

For virtual threads:

cd auth0-java-rest-api-examples
git checkout virtual-threads
./build.sh
./start.sh

I only calculated the startup times for Quarkus and Spring Boot because they're the only apps modified in the virtual-threads branch.

Memory usage (in MB):

Imperative

Framework 0 requests 1 request 10K requests
Micronaut 53 63 102
Quarkus 37 47 51
Spring Boot 76 86 105
Helidon 82 92 69

Imperative with Virtual Threads

Framework 0 requests 1 request 10K requests
Quarkus 37 48 51
Spring Boot 76 87 102

Reactive

Framework 0 requests 1 request 10K requests
Micronaut 53 62 102
Quarkus 49 50 53
Spring Boot 75 107 174
Helidon 44 45 68

The slow startup seems to be caused by the frameworks fetching the issuer or JWKS URI on startup. If I turn off my internet and run ./start.sh quarkus, it results in an error:

2024-01-31 14:13:43,816 WARN  [io.qua.oid.com.run.OidcCommonUtils] (vert.x-eventloop-thread-1) 
OIDC Server is not available:: java.net.UnknownHostException: dev-06bzs1cu.us.auth0.com: 
nodename nor servname provided, or not known

Same with Helidon:

Exception in thread "main" io.helidon.common.configurable.ResourceException: 
Failed to open stream to uri: https://dev-06bzs1cu.us.auth0.com/.well-known/jwks.json

Since these frameworks are supposed to be as fast as possible for serverless environments, I think they do this processing on the first request rather than on startup.

For some background, when I first started this comparison a couple of years ago, Spring Security and Helidon did the same thing (processing on startup instead of first request). They both switched to processing on the first request.

Not sure it makes any difference, but you could re-write the quarkus HelloResource class as

@Path("/hello")
public class HelloResource {

    @Inject
    SecurityIdentity securityIdentity;

    @GET
    @Path("/")
    @Authenticated
    @Produces(MediaType.TEXT_PLAIN)
    @NonBlocking
    public String hello() {
        return "Hello, " + securityIdentity.getPrincipal().getName() + "!";
    }
}

You don't need to explicitly return a Uni. Adding @NonBlocking will tell Quarkus to run the method on the event loop as a non-blocking method.

Thanks for the code review, @edeandrea! I added your suggested change in ee7b4ce.

I discovered why Quarkus reactive is slow to start. It's because it's looking up the issuer rather than doing it lazily. Please take a look at my comment above for more information.

I discovered why Quarkus reactive is slow to start. It's because it's looking up the issuer rather than doing it lazily. Please take a look at #5 (comment) for more information.

I think I have a way around that. Try adding this to application.properties:

# https://quarkus.io/guides/all-config#quarkus-oidc_quarkus.oidc.jwks.resolve-early
quarkus.oidc.jwks.resolve-early=false

# https://quarkus.io/guides/all-config#quarkus-oidc_quarkus.oidc.discovery-enabled
quarkus.oidc.discovery-enabled=false

# https://quarkus.io/guides/all-config#quarkus-oidc_quarkus.oidc.jwks-path
quarkus.oidc.jwks-path=${quarkus.oidc.auth-server-url}/.well-known/jwks.json

This is what I get now:

╰─ ./start.sh quarkus
date (GNU coreutils) 9.4
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David MacKenzie.
8.0.0
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2024-01-31 16:45:01,816 INFO  [io.quarkus] (main) quarkus 1.0.0-SNAPSHOT native (powered by Quarkus 3.6.7) started in 0.013s. Listening on: http://0.0.0.0:8080
2024-01-31 16:45:01,816 INFO  [io.quarkus] (main) Profile prod activated. 
2024-01-31 16:45:01,816 INFO  [io.quarkus] (main) Installed features: [cdi, oidc, resteasy-reactive, security, smallrye-context-propagation, vertx]
App is available on port 8080. Duration to start: 30 milliseconds
2024-01-31 16:45:02,098 INFO  [io.quarkus] (Shutdown thread) quarkus stopped in 0.001s

I also downgraded to 3.6.7 because 3.7 hasn't yet been released. That version switch may or may not have any effect.

I also downgraded to 3.6.7 because 3.7 hasn't yet been released. That version switch may or may not have any effect.

switching to 3.7.1 also seems to work fine and produce similar results, even if it hasn't yet been released.

Also, using JsonWebToken instead of SecurityIdentity probably doesn't matter from a performance perspective either. If you'd rather stick to a MicroProfile interface vs a Quarkus-specific one thats probably fine.

@edeandrea @mraible thanks, quarkus.oidc.jwks.resolve-early=false is a fairly specific feature when for example an authentication to the JWKS endpoint has to be performed at the time the token is available.
Skipping the discovery step is indeed one way to accelerate.

Using dynamic tenant resolver can complement it: https://quarkus.io/guides/security-openid-connect-multitenancy#tenant-config-resolver - for the bearer single tenant case one can just implement it as:

        final OidcTenantConfig config = new OidcTenantConfig();

        config.setTenantId("tenant-c");
        config.setAuthServerUrl("http://localhost:8180/realms/tenant-c");
        return Uni.createFrom().item(config);

The config resolution will take time at the first request

Thanks for the fix, @edeandrea! Personally, I think these settings should be the default and you should have to turn on startup discovery. Regarding the 3.7.1 version, I get that when creating a new project with the Quarkus CLI:

sdk install quarkus
quarkus create app com.okta.rest:quarkus --extension="quarkus-oidc,resteasy-reactive" --gradle

@sberyozkin If I remove the resolve-early property, it starts OK but prints this to the console:

2024-01-31 16:06:48,551 WARN  [io.qua.oid.run.OidcRecorder] (vert.x-eventloop-thread-1) OIDC server 
is not availa2024-01-31 16:06:48,555 WARN  [io.qua.oid.run.OidcRecorder] (vert.x-eventloop-thread-1) 
Tenant 'Default': 'OIDC Server is not available'. OIDC server is not available yet, an attempt to connect 
will be made during the first request. Access to resources protected by this tenant may fail if OIDC server 
will not become available

Since printing this message will likely slow startup, I'll keep using the property.

Update: Start time for Quarkus: (45 + 41 + 42 + 44 + 49) / 5 = 44.2

Personally, I think these settings should be the default and you should have to turn on startup discovery.

I would tend to agree. @sberyozkin is the right person to answer as to why it is the way it is.

Reopening since Helidon is still slow to start, even after #6. (183 + 265 + 177 + 386 + 303) / 5 = 262.8

geoand commented

switching to 3.7.1 also seems to work fine and produce similar results, even if it hasn't yet been released.

For projects that always want to be on the bleeding edge, using

<quarkus.platform.artifact-id>quarkus-bom</quarkus.platform.artifact-id>

gets around this problem

Hi @mraible @edeandrea

@sberyozkin If I remove the resolve-early property, it starts OK but prints this to the console:...

So, the OIDC server is indeed may not be available at the startup time ? That is fine then, Quarkus will attempt to reconnect at the first request, but I guess at the demo level it may be of some concern.

quarkus.oidc.jwks.resolve-early=false has not been introduced to deal with the startup time optimizations but to address a concrete user requirement related to the JWKS endpoint authentication which is typically unnecessary. I'm happy that it can help with this optimization though. That said it is a new property and may need some time to settle.

As I said earlier, you can try removing everything from the application properties and have the following bean in the project:

import jakarta.enterprise.context.ApplicationScoped;
import io.quarkus.oidc.OidcRequestContext;
import io.quarkus.oidc.OidcTenantConfig;
import io.quarkus.oidc.TenantConfigResolver;
import io.smallrye.mutiny.Uni;
import io.vertx.ext.web.RoutingContext;

@ApplicationScoped
public class Auth0TenantConfigResolver implements TenantConfigResolver {
    @Override
    public Uni<OidcTenantConfig> resolve(RoutingContext context, OidcRequestContext<OidcTenantConfig> requestContext) {
         final OidcTenantConfig config = new OidcTenantConfig();
         config.setTenantId("auth0");
         config.setAuthServerUrl("https://dev-06bzs1cu.us.auth0.com");
         // Optionally disable the discovery too
         config.setDiscoveryEnabled(false);
         config.setJwksPath("https://dev-06bzs1cu.us.auth0.com/.well-known/jwks.json"); 
         return Uni.createFrom().item(config);
    }
}

At least please keep this option in mind.

Personally, I think these settings should be the default and you should have to turn on startup discovery.

IMHO it is important to be able to know at the prod startup time if the OIDC server is actually available. We have 2 options, one which Matt is currently using, and the one I've just typed, that can give users all the flexibility. Cheers