aio-libs-abandoned/aioredis-py

aioredis 1.3.1 vs 2.0.0 performance

artesby opened this issue · 6 comments

Describe the bug

aioredis==2.0.0 performs x4-10 times slower than aioredis==1.3.1

to see this we did a simple test:

test_aioredis1.py
import asyncio
import os
import time
import aioredis

url = os.environ.get('REDIS_URL', 'redis://localhost:6379/13')
max_conn = os.environ.get('MAX_CONNECTIONS', 64)

async def task(i, redis):
    key = f'key:{i}'
    v = await redis.get(key)
    new_v = 1 if v is None else int(v) + 1
    await redis.set(key, new_v, expire=600)

async def run(n=1500):
    redis = await aioredis.create_redis_pool(
        url, encoding='utf-8', maxsize=max_conn
    )

    tasks = [asyncio.create_task(task(i, redis)) for i in range(n)]
    start = time.time()
    await asyncio.gather(*tasks)
    t = time.time() - start
    print(f'aioredis1: {n} tasks with blocking pool with {max_conn} connections: {t}s')


if __name__ == "__main__":
    asyncio.run(run(n=1500))
test_aioredis2.py
import asyncio
import os
import time
from aioredis import BlockingConnectionPool, Redis

url = os.environ.get('REDIS_URL', 'redis://localhost:6379/14')
max_conn = os.environ.get('MAX_CONNECTIONS', 64)

async def task(i, redis):
    key = f'key:{i}'
    v = await redis.get(key)
    new_v = 1 if v is None else int(v.decode()) + 1
    await redis.set(key, new_v, ex=600)

async def run(n=1500):
    pool = BlockingConnectionPool.from_url(
        url=url, max_connections=max_conn
    )
    redis = Redis(connection_pool=pool)

    tasks = [asyncio.create_task(task(i, redis)) for i in range(n)]
    start = time.time()
    await asyncio.gather(*tasks)
    t = time.time() - start
    print(f'aioredis2: {n} tasks with blocking pool with {max_conn} connections: {t}s')


if __name__ == "__main__":
    asyncio.run(run(n=1500))

To Reproduce

  1. pip3 install aioredis==2.0.0 && python3 test_aioredis2.py
  2. pip3 install aioredis==1.3.1 && python3 test_aioredis1.py

Expected behavior

aioredis v1.3.1 and v2.0.0 have same performance

Logs/tracebacks

aioredis2: 1500 tasks with blocking pool with 64 connections: 0.3346226215362549s
aioredis1: 1500 tasks with blocking pool with 64 connections: 0.0765523910522461s

Python Version

$ python --version
Python 3.7.5

aioredis Version

$ python -m pip show aioredis
Name: aioredis
Version: 1.3.1

$ python -m pip show aioredis
Name: aioredis
Version: 2.0.0

Additional context

The problem is reproducible using either local redis (in docker) or redis on remote machine.
Increasing n coroutines count monotonically increases the gap between execution time on aioredis1 vs aioredis2.

We tried to investigate the problem using cProfiler & yappi profiler. Looks like the root of the problem is somewhere near interaction with SocketBuffer, but we did not find exactly where.

So any help would be appreciated

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Hey @artesby -

Thank you for the benchmarks! I suspect the issue lies with our connections re-implementation, which is using a port of reds-py's SocketBuffer. This has the negative affect of passing all socket read/writes through two buffers rather than one.

I'll take a look at this and see if my hunch is correct.

@seandstewart I was also curious about this issue and did a bit of digging.

I suspect the issue lies with our connections re-implementation, which is using a port of reds-py's SocketBuffer.

I don't think this is the case, as if HiredisParser is explicitly used it doesn't use SocketBuffer at all, but performance stays in the same range as with PythonParser as in measurements above for 2.0.

I'd guess the problem lies that 1.3.1 used implicit pipelining or requests, as it didn't call drain after writing command and had a separate reader task.
I did a couple of hacky implementations to get some baseline measurements I'll post them here a bit later after cleanup which seems to support my hypothesis.

So here is the repo with a couple of the above benchmarks and a couple of experimental clients:
https://github.com/m-novikov/aioredis-benchmarks

Here is very minimalistic client covering the above use case (get, set) trying to be as performant as possible. It performs about 50% better than 1.3 version. I used callback-based asyncio.Protocol to avoid additional costs due to context switching between coroutines. Feel free to review the code and point out if I missed something.
This is the most barebones version that I could come up with it and it should be a good baseline for this workload.

Also, I tried to port this protocol-based implementation to current aioredis #1216 this is not a complete solution, the goal of this PR is to have a discussion.

Ah, yes I recall the logic now. This pipelining would be more efficient, but it made debugging extremely difficult.

Thanks for the example using a Protocol, I've not used this myself, but I like its straightforward approach to this problem.

I'll take a look at the PR and poke around on it.

Ah, yes I recall the logic now. This pipelining would be more efficient, but it made debugging extremely difficult.

I'd say it's not a problem of pipelining, but mostly the code around it, IMO as long as there is explicit state machine reading from the connection, it should significantly simplify debugging even when using pipelining.

I'll take a look at the PR and poke around on it.

Awesome thanks, I propose we move the further discussion to the PR.

It's actually even worse when redis is on remote machine and under high workload. Current version is up to 40 times slower compared to 1.3 branch.

simple_async: 10000 tasks with blocking pool with 64 connections: 0.36479735374450684s
aioredis1: 10000 tasks with blocking pool with 64 connections: 0.41457247734069824s

aioredis2: 10000 tasks with blocking pool with 64 connections: 16.801677227020264s

aioredis2_proto: 10000 tasks with blocking pool with *1* connection: 0.5052070617675781s