googleapis/python-datastore

Batches in >=2.0.0 are significantly slower

BinarSkugga opened this issue · 9 comments

I believe there is a change of API or a performance issue when using batches with 2.1.0.

Environment details

  • OS type and version: Windows 10
  • Python version: 3.8.7
  • pip version: 20.3.3
  • google-cloud-datastore version: 2.1.0
  • google-api-core version: 1.24.1

Steps to reproduce

  1. Use DatastoreClient::put_multi with 500 entities inside a batch

Code example

with client.batch():
    client.put_multi(chunk)

Performance

When using 1.15.0, my code can handle 9 batches of 500 entities in about 1-2 seconds while 2.1.0 takes literal minutes.
Disclaimer: I use the datastore emulator

EDIT: It can be reproduced in 2.0.0

Are you seeing this in both emulator and live endpoints?

I haven't tested on a live endpoint. I'm in vacation for the week, I'll definitely test this when I come back.

Thanks @BinarSkugga . I know we overhauled, rather substantially, the transport layer in the v1 -> v2. It would be very helpful to understand if you are seeing a slowdown on a live endpoint.

I am back from my vacation. I do not experience this behavior with a live endpoint. Should I move the issue to the emulator's repository ?

EDIT: I can even measure an improvement in performance with the new >=2.0.

Additional potentially impactful information:

  • The emulator doesn't log any errors.
  • I use those batches in parrallel using Python's ThreadPoolExecutor from the concurrent package.
  • Deleting entities seems also slightly slower but its negligible compared to writes and reads.
  • Might be worth to note that I had to remove the 'indices' key in my Entity object as the emulator doesn't seem to understand it. It tries to parse it as a column and fails because it contains a set().

I may have talked to fast. It also happens on live endpoints.
Logs from 1.15.x:

[SOLLUM][DATA][INFO][10:42:35]: Fetching 4130 'LayoutEntity' entities...
[SOLLUM][DATA][VERBOSE][10:42:35]: Using 9 'LayoutEntity' batches...
[SOLLUM][DATA][VERBOSE][10:42:36]: Fetched 'LayoutEntity' batch #1/9
[SOLLUM][DATA][VERBOSE][10:42:36]: Fetched 'LayoutEntity' batch #2/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #4/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #3/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #5/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #6/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #7/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #8/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #9/9

Logs from 2.1.x:

[SOLLUM][DATA][INFO][14:35:42]: Fetching 4130 'LayoutEntity' entities...
[SOLLUM][DATA][VERBOSE][14:35:42]: Using 9 'LayoutEntity' batches...
[SOLLUM][DATA][VERBOSE][14:36:48]: Fetched 'LayoutEntity' batch #1/9
[SOLLUM][DATA][VERBOSE][14:39:31]: Fetched 'LayoutEntity' batch #2/9
[SOLLUM][DATA][VERBOSE][14:39:32]: Fetched 'LayoutEntity' batch #3/9

Here's the code that handles this:

def _do_fetch_chunk(chunk, client, kind, count, total):
    with client.batch():
        ents = client.get_multi(chunk)
        Loggers.DATA.verbose('Fetched \'{}\' batch #{}/{}'.format(kind, count.next(), total))
        return ents

count = AtomicNumber()
chunked = list(chunks(keys, config.MAX_BATCH_ENTITY))
Loggers.DATA.verbose('Using {} \'{}\' batches...'.format(len(chunked), kind))
entities = ConcurrentService.i().thread_each(chunked, _do_fetch_chunk, self.client, kind, count, len(chunked))
return [item for sublist in entities for item in sublist]

The concurrent service simply submit a new task per batch in the thread pool.

Any insight on this ?

I have received another customer report. While not quite the difference stated here, 2-3x slower is about what the other instance was observing. It would be good to benchmark a few operations and see if there are any obvious gains that could be made.

Reference PR from PubSub that was able to get back some time: googleapis/python-pubsub@c29d7f8

I've been attempting to recreate this, but so far am seeing more or less similar outcomes for batch operations between versions 2.1.0 and 1.15.3.

Querying may very well be slower, however.

First test run:

$ python datastore-test.py                         
[python-datastore@1.15.3] finished creation loop in 5.0 ms
[python-datastore@1.15.3] finished batch write in 1926.0 ms
[python-datastore@1.15.3] finished batch get in 391.0 ms
[python-datastore@1.15.3] finished batch update in 1706.0 ms
[python-datastore@1.15.3] finished batch query in 2216.0 ms
  > loaded 3500 objects

Switch to version 2.1.0 and run again:

python datastore-test.py              
[python-datastore@2.1.0] finished creation loop in 4.0 ms
[python-datastore@2.1.0] finished batch write in 2109.0 ms
[python-datastore@2.1.0] finished batch get in 588.0 ms
[python-datastore@2.1.0] finished batch update in 1701.0 ms
[python-datastore@2.1.0] finished batch query in 3702.0 ms
  > loaded 4000 objects

I will continue to investigate, but for now, here's the testing script I'm using.