Validation is extremely slow for large queries with fragment spreads due to lack of `implementers_map` caching
xuorig opened this issue ยท 7 comments
I've been investigating slow parsing time in our application. It appears to be due to validate_fragment_spread_type
recomputing a large map of implementers everytime it is called. For large schemas, this creates a very large hashmap everytime. Combined with larger queries this can have a very large effect and we're seeing parse times in the seconds.
Instead, the implementers_map
can be computed once either through a SchemaWithCache
structure, which is used elsewhere already, or by passing along a reference to the pre-computed map during validation. It would be great if this map could be computed only once per schema rather than once per query.
I'm happy to contribute a fix if you are opened to it!
That's a not great oversight. thanks for the report!
One approach we could consider here is to add a cache to Schema
itself and provide access to the cache through Valid<Schema>
only. Inside schema validation we can manually pass a reference even if the cache is not populated yet (if we need it). operation validation requires a valid schema so we can update the function signatures and reuse the cache between different queries.
I think using the SchemaWithCache
solution is also acceptable, though it would still require recomputing the map once for every query. If you'd like to contribute that then I'd be totally happy ๐
As a more concrete idea for caching on the schema itself so that it's reusable between different queries, we might have something like the below. If we cache something inside Schema
, we have to guarantee that it is only cached when the schema is immutable through Valid<>
, and we have to clear the cache when going back to a mutable schema through Valid::into_inner
. that could be achieved with a trait.
struct Schema {
// .. all the current stuff ..
/// Implementers cache: this must only be accessed through Valid<Schema>.
implementers_map: OnceLock<HashMap<Name, Implementers>>,
}
impl Valid<Schema> {
// This could have the API that SchemaWithCache has today, that populates the cache on demand.
fn implementers_of(&self, name: Name) {}
}
// & then the stuff to support invalidating the cache:
struct Valid<T: Invalidate>(T);
impl<T> Valid<T> {
fn into_inner(self) -> T {
self.0.invalidate()
}
}
trait Invalidate {
// default implementation for types that can be used with Valid<> but do not require invalidation
fn invalidate(self) -> Self { self }
}
impl Invalidate for Schema {
fn invalidate(mut self) -> Self {
self.implementers_map.take();
self
}
}
// this can use the default implementation.
impl Invalidate for ExecutableDocument {}
@goto-bus-stop I was thinking about this over the weekend, another option is to not cache it but instead have it as a living thing in the schema, that gets built along the schema.
Of course it's a bit more complex, again because of mutable schemas and updating this mapping as it is built / modified.
But it's a very common structure to access in GraphQL schemas in general, so it's tempting to just make it a "first class" kind of thing. Thoughts?
I'm not sure if that would work with the current design, which encourages directly mutating the schema structs. Wouldn't users have to manually update the implementers map, or apollo-compiler to provide specific methods to modify the schema?
which encourages directly mutating the schema structs
Gah good point, I thought this was happening through add_type
and add_document
kind of APIs only. nevermind!
I'll work on this Invalidate
idea today so we can reuse the cache between validation runs.
Thank you @goto-bus-stop, sounds great ๐