mirumee/ariadne

Feature request: cache query parsing and validation

lukapeschke opened this issue · 10 comments

Hello,

First of all, thanks for your work on Ariadne, I've really enjoyed working with it until now!

I'm opening this issue because we recently ran a load test on a microservice which basically translates Graphql queries into SQL, and fetches data from postgres with sqlalchemy's async API . (we ran the tests with Python 3.11 on a single gunicorn worker with uvloop). We noticed a high CPU usage, and profiling showed that up to 30% of all the CPU time was spent in the parse_query and validate_query functions.

When I tried to implement a simpler version of the graphql function with a cache on query parsing and validation, the number of requests by second increased by 22%, and the median request duration decreased by 62% (P95 duration decreased by 35%).

from graphql import execute as _execute_graphql
from graphql import parse as _parse_graphql

from my_service import GRAPHQL_SCHEMA

# GRAPHQL_SCHEMA is a global here to prevent lru_cache from re-hashing the same object every time, but since 
# GraphQLSchema is hashable, it could also be passed as a parameter
@lru_cache(maxsize=64)
def parse_and_validate_query(query: str) -> tuple[DocumentNode, list[GraphQLError]]:
    parsed = _parse_graphql(query)
    validation_errors = validate_query(schema=GRAPHQL_SCHEMA, document_ast=parsed)
    return parsed, validation_errors


async def execute_graphql_query(
    schema: GraphQLSchema,
    data: Any,
    *,
    debug: bool = False,
    error_formatter: ErrorFormatter = format_error,
    logger: Logger | None = None,
    context_value: Any | None = None,
) -> GraphQLResult:
    try:
        validate_data(data)
        variables, operation_name = (
            data.get("variables"),
            data.get("operationName"),
        )
        ast_document, validation_errors = parse_and_validate_query(query=data["query"])
        if validation_errors:
            return handle_graphql_errors(
                errors=validation_errors,
                logger=logger,
                error_formatter=error_formatter,
                debug=debug,
            )

        result = _execute_graphql(
            schema,
            ast_document,
            variable_values=variables,
            operation_name=operation_name,
            context_value=context_value,
        )
        if is_awaitable(result):
            result = await cast(Awaitable[ExecutionResult], result)

    except GraphQLError as error:
        return handle_graphql_errors(
            [error], logger=logger, error_formatter=error_formatter, debug=debug
        )

    return handle_query_result(result, logger=logger, error_formatter=error_formatter, debug=debug)

My question is: Would it make sense to have such a caching feature available in Ariadne directly ? If yes, I'd be willing to have a look

rafalp commented

But is this a performance bottleneck for your application? I've always looked at query parsing and validation like I've looked at templates rendering in Django: sure, it takes 20% of response generation time, but as application grows, time spent in views and middlewares doing IO quickly out-weights that because database tables and locks become an issue much sooner than CPU time rendering strings from templates does.

So while this seems considerable in benchmarks, I have not seen a real-life scenario where query parser was a bottleneck, and not complexity or IO operations in resolvers.

Also, please note that Ariadne already supports parser caching through query_parser configuration option, so you would only have to pass your lru cached parser to enable this optimization.

Maybe we should run parse and validate in single operation, or assume that if custom parser was used, it also does validation and skip custom validation? But then what about custom query validation (like query cost validators)?

But is this a performance bottleneck for your application?

Yes it is a bottleneck: Our application has not much business logic, but the queries it processes can be quite big and nested. And we ran the load tests with realistic data to be as close to "real life" as possible.

Also, please note that Ariadne already supports parser caching through query_parser configuration option, so you would only have to pass your lru cached parser to enable this optimization.

That's true, but a lot of time was also spent in the validate_query function (see attached flamegraph) maybe walking the AST of big queries is long ?

image

Maybe we should run parse and validate in single operation, or assume that if custom parser was used, it also does validation and skip custom validation? But then what about custom query validation (like query cost validators)?

Maybe a simple way to do this could be add an optional query_validator parameter ? Something like query_validator: QueryValidator = validate_query ? It would then be up to the user's implementation to apply the passed validation rules. That way, only a small change in ariadne would be needed (and we don't break the API), and cache management would be up to the users rather than the lib. WDYT ?

rafalp commented

How much time was spent in query validation? I see that validate() did do quite a lot here, but 10ms from 20ms is still non issue to me.

And we ran the load tests with realistic data to be as close to "real life" as possible.

Is this data available anywhere? I would love to see it. We also have benchmark suite on this repo, how would that compare to your results, on your setup?

Maybe a simple way to do this could be add an optional query_validator parameter?

Maybe...? I am not saying no to having another escape hatch for people looking to squeeze extra performance from their GraphQL servers, I am just having doubts that it would be as simple as having optional query_validator parameter, plenty of custom query validation rules in the wild are affected by variables or user auth.

How much time was spent in query validation? I see that validate() did do quite a lot here, but 10ms from 20ms is still non issue to me.

The CPU load is more of an issue for us than the time spent: the more cpu-bound tasks we have while processing a request, the less requests we can process concurrently

Is this data available anywhere? I would love to see it. We also have benchmark suite on this repo, how would that compare to your results, on your setup?

Unfortunately, I'm not allowed to share it 😕 But If I have time this week, I'll have a look at the benchmark suite and see if I can provide something similar for the tests.

Maybe...? I am not saying no to having another escape hatch for people looking to squeeze extra performance from their GraphQL servers, I am just having doubts that it would be as simple as having optional query_validator parameter, plenty of custom query validation rules in the wild are affected by variables or user auth.

I agree that this wouldn't solve all problems, only the validation memoization part. While having a look at the code this week, I found #1070 which already implements query_validator, so I believe that this issue can be closed, since it should be available in 0.20.0

rafalp commented

Yup, in 0.20 which I would like to get out the door soonish you should be able to achieve this through setting custom parser and validator pair.

rafalp commented

@lukapeschke Hi, Ariadne 0.20 was released a while ago. Did you have a chance to experiment with it and see if it's parser and validation hooks work for you?

@rafalp Hello, Yes, I've finally been able to try the new hooks out, and everything works as expected 🙂 Would you like me to add an example to the docs ? Otherwise, I believe this can be closed

rafalp commented

Yes. I think having separate guide somewhere in the docs would be ideal. Maybe Query caching under Extensions or at the end of the Docs section, but thats a detail to work out later.

@rafalp here's a small example: mirumee/ariadne-website#157 . Please let me know what you think!

rafalp commented

Continuing doc discussion in linked PR