slicknode/graphql-query-complexity

Pass Current Type to complexity()

Saeris opened this issue · 4 comments

Hey so I've been trying to figure out how to use your library in my project and I ran across a use case that which would require an extension to field.complexity(). It's a bit hard to explain given a lack of well defined terms, but perhaps I can demonstrate what I'm looking for in an example.

Let's say I have an Album type:

const Album = new GraphQLObject({
  name: `Album`,
  fields: () => ({
    id: {
      type: GraphQLID
    },
    totalPhotos: {
      type: GraphQLInt // Let's say we have 100 Photos in the Album
    },
    photos: {
      type: new GraphQLList(Photo),
      args: {
        count: { type: GraphQLInt } // And we pass a Count of 150 in our Query
      },
      // It would therefor follow that if we used just args.Count alone, our complexity would be inflated
      // The calculated complexity of this node should be capped at 100
      complexity: (args, childComplexity, parent) =>
        childComplexity * (args.count < parent.totalPhotos ? args.count : parent.totalPhotos),
      // Lets assume our resolver will return an array of Photos of size Count, or all the Photos in the Album
      resolve: (parent, args, info) => fetchAlbumPhotos(parent.id, args.count)
    }
  })
})

To make this possible a change would need to be made to field.complexity(), ie: the addition of a third argument parent, similar to resolve as used in the above example. parent would be an object containing all the current values of the Type.

I hope my code comments make clear how complexity calculation could be inaccurate in a case such as this. If we look at maximumComplexity as a budget for our queries, then we don't want to over-pay for the cost of execution.

Let me know if you need further clarification!

ivome commented

I am not 100% sure if I fully understood what you are trying to do, but it looks like you want to calculate the complexity based on the actual resolved values, correct?
In that case I would suggest to track the complexity inside of the resolvers, because there is no way to determine the object in the validation rules since validation runs before the values are resolved. You might have interfaces that return different objects, for example. So in your execution context you could have a property that you just increase during resolution context.complexity += 100 and if it reaches a certain threshold, then throw an exception.
That has the disadvantage that you have to start executing and resolving, so the damage through a very expensive query might already be done.
If you just want to cap at 100, you could just return a max of 100 in your complexity calculation function:

complexity: (args, childComplexity, parent) => {
  // calculate some value
  return Math.min(100, calculatedValue);
}

Does that help? If that's not what you meant, please elaborate further ;-)

ivome commented

One other thing to keep in mind:
If you calculate the query based on the resolved values, your system might run fine in the beginning and be below the threshold and as soon as you start adding data, queries that worked fine start failing unexpectedly. So you might be better off setting a maximum value and cap the number of returned items in fetchAlbumPhotos.

After a day away from looking at code I think I realize the problem with what I'm asking for here. When I wrote this I wasn't thinking about how the complexity calculation is done before any query is executed and prevents execution based on the score.

The actual problem I'm trying to solve for here is closely related to the example code I gave. I'm writing a gateway to an existing REST API. Fetching the data for the Album type would be one REST call (say for example in a query getAlbumById), and to get the photos for that Album would be a second resolver making another call to the REST API. From the result of the first call I'll know how many photos are in the album. Based on that number I can estimate how many calls I'll need to make to fetch either every photo in the album (because the results for that are paginated), or just the number of photos I've specified in the count argument (I can limit the execution to just however many calls I need to make to get the results from that many pages, ie: 500+ photos in the Album, get the first 200, calls return 100 per page, make 2/6 calls to the REST API).

So given that information, can you see how that would affect the cost estimation for that particular field? Given that the results are variable, because you don't know what the "maximum" count is without resolving the parent type first. The expensive part of each query is the number of individual calls to the REST API that need to be executed. When resolving the Album type, I'll get the data for most of the fields for that type in one call, some fields like photos need to make subsequent calls and they can't be made until the parent type has been resolved (I'm using the Album's id field, which has already been resolved, in the resolver for photos).

I get that this is fundamentally not as simple as I originally thought it would be to implement. What I would need here instead is per-field budgeting. Meaning the validation rule would have to determine which fields to return null during execution as information becomes available to decide whether a field is too expensive in the overall budget. That may not be possible with validation rules.

ivome commented

I get that this is fundamentally not as simple as I originally thought it would be to implement. What I would need here instead is per-field budgeting. Meaning the validation rule would have to determine which fields to return null during execution as information becomes available to decide whether a field is too expensive in the overall budget. That may not be possible with validation rules.

This is not possible with the rules. You either do the validation ahead of execution (as it is done in this library) and don't use the actual data for budgeting, or you do the budgeting based on the returned data but then there is no way to determine the child complexity before the data is retrieved.

I can think of two options:

  1. Use this library and set a maximum complexity on your variable fields while maybe limiting the number of returned photos.
  2. You accumulate the cost after the query is run and calculate the complexity in the resolvers yourself. This complexity can then be used to throw an exception in case a threshold is reached.