Exabyte-io/esse

Get scheme by id to use constant time lookup

Opened this issue · 1 comments

There's a core method call of get_schema_by_id which is doing an O(N) call.

class ESSE(object):
    """
    Exabyte Source of Schemas and Examples class.
    """

    def __init__(self):
        self.schemas = SCHEMAS
        self.examples = EXAMPLES

    def get_schema_by_id(self, schemaId):
        return next((s for s in SCHEMAS if s.get("schemaId") == schemaId), None)

While parsing in libs like Exabtye's express are probably limited by file parsing IO and N is small here (~200), get_schema_by_id is called from serialize_and_validate on every property. The call can be converted to a O(1) lookup with a minor change.

class ESSE(object):
    def __init__(self):
        self.schemas = SCHEMAS
        self._schemas = {s['schemaId']: s for s in self.schemas if s.get('schemaId') is not None}
        self.examples = EXAMPLES

    def get_schema_by_id(self, schemaId):
        return self._schemas.get(schemaId)

Just a quick note - thanks for this helpful suggestion, Michael! We'll review and plan to schedule this for inclusion in the next release (later in Q1 or early Q2).