graphql-python/graphql-core

Performance issues related to complete_value on large datasets

JCatrielLopez opened this issue · 3 comments

Hi! We've noticed that returning a list of 5k elements, with a couple of nested objects is pretty slow:

Person {
  id
  name
  lastname
  age
  address {street number}
  job {id org_name}
  partner {id name}
  pets {name type}
  school {id name}
}

image

ncalls tottime percall cumtime percall filename:lineno(function)
1 1.8e-05 1.8e-05 2.145 2.145 graphql.py:103(graphql_sync)
1 1.5e-05 1.5e-05 2.145 2.145 graphql.py:152(graphql_impl)
1/30001 0.19 6.335e-06 2.137 7.122e-05 execute.py:413(ExecutionContext.execute_fields)
1 1.3e-05 1.3e-05 2.137 2.137 execute.py:965(execute)
1 7e-06 7e-06 2.137 2.137 execute.py:328(ExecutionContext.execute_operation)
1/135001 0.3229 2.392e-06 2.135 1.581e-05 execute.py:485(ExecutionContext.execute_field)
1/145001 0.2737 1.888e-06 2.071 1.428e-05 execute.py:575(ExecutionContext.complete_value)
1 0.009884 0.009884 2.071 2.071 execute.py:660(ExecutionContext.complete_list_value)
5000/30000 0.02747 9.156e-07 2.026 6.752e-05 execute.py:893(ExecutionContext.complete_object_value)

By itself it's not really a slow function, but its executed 30k times. Is there any way to reduce the overhead by reducing the number of times this function is invoked?

Tested on Python 3.8 and graphql-core==3.2.3

Possibly related to this graphql-js issue

Cito commented

Thanks for reporting. Will look into this when I have more time, probably only after releasing 3.3. It would be helpful if you could post example code with dummy data to reproduce this.

schema.graphql:

type Query {
    persons: [Person]
}

type Person {
    id: String!
    name: String
    ssn: String
    alive: Boolean
    has_job: Boolean
    job: JobDetails
    address: Address
    pets: Address
    house: House
    partner: Person
}

type JobDetails {
    id: String
    name: String
}

type Address {
    id: String
    name: String
}

type Pets {
    id: String
    name: String
    race: String
    color: String
}

type House {
    color: String
    floors: Int
    is_duplex: Boolean
    is_apt: Boolean
}

server.py:

import random
import string
import sys

import yappi

from graphql import graphql_sync, build_ast_schema
from graphql.language.parser import parse

yappi.set_clock_type("wall")

with open("./schema.graphql", "r") as f:
    schema = build_ast_schema(parse(f.read()))


class Query:
    """The root resolvers"""

    def persons(self, info):
        output = []
        for _ in range(5_000):
            output.append(
                dict(
                    id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)),
                    name=f"John Doe",
                    ssn="00000000000000000",
                    alive=True,
                    has_job=False,
                    job=dict(id="xxx", name="test"),
                    address=dict(id="yyy", name="Fake Street"),
                    pets=dict(id="zzz", name="test"),
                    house=dict(
                        color="RED",
                        floors=2,
                        is_duplex=False
                    ),
                    partner=dict(id="".join(random.choices(string.ascii_lowercase + string.digits, k=9)), name="test"),
                )
            )
        return output


def main():
    query = """{ 
        persons{ 
            id 
            name 
            alive 
            has_job 
            job{id name}
            partner{id name}
            address{id name}
            pets{id name}
            house{color floors is_duplex}
        } 
    }"""

    yappi.start()
    result = graphql_sync(schema, query, Query())
    yappi.stop()

    if result.errors:
        print(result)
        sys.exit(1)

    yappi.get_func_stats().save("profile", type="pstat")


# To visualize profile:
# python -m snakeviz profile --server

if __name__ == '__main__':
    main()