Performance issues related to complete_value on large datasets

Question

Performance issues related to complete_value on large datasets

JCatrielLopez opened this issue 2 years ago · 3 comments

Hi! We've noticed that returning a list of 5k elements, with a couple of nested objects is pretty slow:

Person {
  id
  name
  lastname
  age
  address {street number}
  job {id org_name}
  partner {id name}
  pets {name type}
  school {id name}
}

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
1	1.8e-05	1.8e-05	2.145	2.145	graphql.py:103(graphql_sync)
1	1.5e-05	1.5e-05	2.145	2.145	graphql.py:152(graphql_impl)
1/30001	0.19	6.335e-06	2.137	7.122e-05	execute.py:413(ExecutionContext.execute_fields)
1	1.3e-05	1.3e-05	2.137	2.137	execute.py:965(execute)
1	7e-06	7e-06	2.137	2.137	execute.py:328(ExecutionContext.execute_operation)
1/135001	0.3229	2.392e-06	2.135	1.581e-05	execute.py:485(ExecutionContext.execute_field)
1/145001	0.2737	1.888e-06	2.071	1.428e-05	execute.py:575(ExecutionContext.complete_value)
1	0.009884	0.009884	2.071	2.071	execute.py:660(ExecutionContext.complete_list_value)
5000/30000	0.02747	9.156e-07	2.026	6.752e-05	execute.py:893(ExecutionContext.complete_object_value)

By itself it's not really a slow function, but its executed 30k times. Is there any way to reduce the overhead by reducing the number of times this function is invoked?

Tested on Python 3.8 and graphql-core==3.2.3

Answer 1 · 2023-01-11T19:42:16.000Z

Possibly related to this graphql-js issue

Answer 2 · 2023-01-11T20:33:17.000Z

Thanks for reporting. Will look into this when I have more time, probably only after releasing 3.3. It would be helpful if you could post example code with dummy data to reproduce this.

Answer 3 · 2023-01-12T11:19:20.000Z

schema.graphql:

type Query {
    persons: [Person]
}

type Person {
    id: String!
    name: String
    ssn: String
    alive: Boolean
    has_job: Boolean
    job: JobDetails
    address: Address
    pets: Address
    house: House
    partner: Person
}

type JobDetails {
    id: String
    name: String
}

type Address {
    id: String
    name: String
}

type Pets {
    id: String
    name: String
    race: String
    color: String
}

type House {
    color: String
    floors: Int
    is_duplex: Boolean
    is_apt: Boolean
}

server.py:

import random
import string
import sys

import yappi

from graphql import graphql_sync, build_ast_schema
from graphql.language.parser import parse

yappi.set_clock_type("wall")

with open("./schema.graphql", "r") as f:
    schema = build_ast_schema(parse(f.read()))


class Query:
    """The root resolvers"""

    def persons(self, info):
        output = []
        for _ in range(5_000):
            output.append(
                dict(
                    id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)),
                    name=f"John Doe",
                    ssn="00000000000000000",
                    alive=True,
                    has_job=False,
                    job=dict(id="xxx", name="test"),
                    address=dict(id="yyy", name="Fake Street"),
                    pets=dict(id="zzz", name="test"),
                    house=dict(
                        color="RED",
                        floors=2,
                        is_duplex=False
                    ),
                    partner=dict(id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)), name="test"),
                )
            )
        return output


def main():
    query = """{ 
        persons{ 
            id 
            name 
            alive 
            has_job 
            job{id name}
            partner{id name}
            address{id name}
            pets{id name}
            house{color floors is_duplex}
        } 
    }"""

    yappi.start()
    result = graphql_sync(schema, query, Query())
    yappi.stop()

    if result.errors:
        print(result)
        sys.exit(1)

    yappi.get_func_stats().save("profile", type="pstat")


# To visualize profile:
# python -m snakeviz profile --server

if __name__ == '__main__':
    main()