alephdata/followthemoney-store

iterate_stream uses deprecated write_object instead of write_entity

Closed this issue · 1 comments

iterate_stream uses the (deprecated) write_object:

def iterate_stream(dataset, file, entity_id=None):

instead of the newer write_entity. The latter uses orjson and might show a significant speed boost, so some before/after benchmarking would be useful here as well.

I ran the following benchmark:

from io import StringIO, BytesIO

from followthemoney import model
from followthemoney.cli.util import write_object, write_entity

import pyperf


ENTITY = {
    "id": "test",
    "schema": "Person",
    "properties": {
        "name": ["Ralph Tester"],
        "birthDate": ["1972-05-01"],
        "idNumber": ["9177171", "8e839023"],
        "topics": ["role.spy"],
    },
}


def bench_write_object(obj):
    write_object(StringIO(), obj)


def bench_write_entity(obj):
    write_entity(BytesIO(), obj)


runner = pyperf.Runner()
obj = model.get_proxy(ENTITY)
runner.bench_func("write_object", bench_write_object, obj)
runner.bench_func("write_entity", bench_write_entity, obj)

and it yielded a significant improvement:

write_object: Mean +- std dev: 2.76 us +- 0.02 us
write_entity: Mean +- std dev: 924 ns +- 8 ns