Pre-Execute CPU Time Management

Question

Pre-Execute CPU Time Management

LRFalk01 opened this issue 4 months ago · 6 comments

Is your feature request related to a problem? Please describe.

In our large GQL queries, the lex/parse/analyze steps can take a considerable time to complete. 120-150ms per request is not uncommon. In these queries, the actual query string is essentially static, but GQL will use this time on every request. This can be up to ~42% of the entire request time. I've found that you can cache parse, and validate can be optionally disabled. I've not really found anything to help with lex (although the c based gem exists) or analyze.

Describe the solution you'd like

I'm not what one would call ... knowledgable about GQL or gql-ruby, so the feasibility of a solution is somewhat unknown to me. My initial thought goes to a system that receives a query string/schema and returns a query like object which is ready to receive a context and run. This query like object would have already done the phases before execute (as much as possible) so that calls to it can be ran with less CPU time. As far as I can tell the existing query object requires a context and has no path to disable analyze.

This is arguably not so much a feature request as it is a request for guidance.

Answer 1 · 2024-09-03T15:31:13.000Z

Hey, thanks for the detailed write-up. Out of curiousity, I would love to see any debugging information about these queries that take so long before execution. For example, a query string or a stackprof result would be very interesting. I'd love to just make it faster and I wonder if something particular about it is hitting a weak spot in the GraphQL-Ruby code. If you're willing to share it here or by email I'd really appreciate it.

In any case, I think we could get you there by adding a way to skip analysis. I'm surprised there isn't already a way, but taking a look, I agree with you: there isn't a way to skip it yet!

I would expect lexing to be skipped when you cache the parsed AST. If you hit a document cache like this one:

# TODO limit the size of this cache to avoid unlimited memory consumption 
doc_cache = Hash.new do |hash, query_str| 
  query = GraphQL::Query.new(MySchema, query_str)
  if query.valid? 
    hash[query_str] = GraphQL.parse(query_str)
  else 
    nil 
  end 
end 


query_str = params[:query]
document = doc_cache[query_str]
should_validate = !document 

MySchema.execute(query_str, document: document, validate: should_validate, ...)

I would expect that to also skip any time with the lexer. But have you seen something different?

Answer 2 · 2024-09-03T15:43:13.000Z

Thanks for the rapid response!

Truth be told, I didn't know what the document param did, so I've ignored it in my research into this. I'll see what this memoize does to my flame graphs (generated by vernier). I'll ask my team if I'm allowed to share more detailed information.

Answer 3 · 2024-09-03T19:46:39.000Z

skip analysis

I think I jumped the gun on this. It's not always safe to skip analysis for identical query strings because analyzers look at variable values. The same query might pass analysis with one set of variables but fail with another set. (For example first: $first could be 1 or 100, resulting in different complexity computations.)

Taking a look at the emailed profile, I get the following first impressions:

343ms total
- 81.5ms pre-execution:
  - Parse: 6.5ms (GraphQL::Parser#parse)
  - Validate: 38ms (GraphQL::Query#valid?)
  - Analyze: 37ms (GraphQL::Analysis.analyze_query)
- 260.5ms execution
  - 2.5ms "eager" (before resolving the first Promise) + 258ms of batched execution
  - The biggest single span is 83ms calling an external service (Loaders::ListingLoader#query)

I have two productive suggestions from looking it over:

It looks like Parse + Validate took 44.ms, so using a document cache could make the whole operation ~13% faster. You'd need some way to make sure the Hash doesn't expand forever. Does your app have a way of determining authoritative clients or queries, so that you could only cache those? If not, you could "just" use a size limit, and only cache incoming query strings over a certain size.
LogQueryComplexityAnalyzer looks like a custom analyzer that inherits from GraphQL::Analysis::QueryComplexity. It spent 5.5ms on #result. GraphQL::Analysis::MaxQueryComplexity#result also spent 5.5 ms on #result, presumably calculating the same value. I bet this is duplicate work, and although I can't see it in the trace, I bet each analyzer is also calculating the complexity independently during analysis. Could these two analyzers be merged to reduce this duplicate work? Maybe you could remove MaxQueryComplexity and have LogQueryComplexityAnalyzer return an error when appropriate (here's the upstream implementation: https://github.com/rmosolgo/graphql-ruby/blob/master/lib/graphql/analysis/max_query_complexity.rb#L12-L16). (And then a rename for LogQueryComplexityAnalyzer would be in order!)

I'm not actually sure how to continue investigating skipping analysis, but I think analysis could be made faster in this case by merging those two. What do you think about trying those two improvements first?

Answer 4 · 2024-09-03T20:28:22.000Z

Thanks for the effort in looking into this!

I discussed this with the team today and we're going to be increasing our priority of moving over to persisted queries for this cache path. In experimenting with it, it made exact impact you said. Moving persisted queries to this path will allow us to bypass an indefinitely growing cache problem.

With regards to custom analyzer, you're 100% correct that it's duplicate. It only runs in Dev mode to print some stuff to the terminal. I should have disabled it for the report.

Answer 5 · 2024-09-04T12:54:39.000Z

I'm happy to take a look -- I'm always looking for more information about GraphQL-Ruby performance "in the wild" because the reality of production systems can be so different. Thanks for sharing.

I don't think I have anything else to work on for the time being here, so I'll close this. Please let me know if you have any trouble in addressing those issues 👍

Answer 6 · 2024-09-04T13:03:45.000Z

Thanks again for the insights. We have a pretty clear path forward to remove ~75% of this CPU time which is exciting.