omnilib/aiosqlite

Why is aiosqlite so slower than sqlite3?

Closed this issue · 5 comments

Description

Below is a code that inserts 1 million rows with sqlite3 library and prints its running time.

import time
import sqlite3

start = time.time()
def insert():
    with sqlite3.connect('numbers.db') as DB:
        DB.execute(f"BEGIN TRANSACTION;")
        for i in range(1000000):
            query = f"INSERT INTO numbers(number) VALUES ({i});"
            DB.execute(query)
        DB.commit()
insert()
end = time.time()
print(end-start)

If I run the code, the result is about 7 seconds on my computer. But if I use aiosqlite with the code below:

import time
import aiosqlite
import asyncio

start = time.time()
async def insert():
    async with aiosqlite.connect('numbers.db') as DB:
        await DB.execute(f"BEGIN TRANSACTION;")
        for i in range(1000000):
            query = f"INSERT INTO numbers(number) VALUES ({i});"
            await DB.execute(query)
        await DB.commit()
asyncio.run(insert())
end = time.time()
print(end-start)

then the result is about 110 seconds on my computer, which shows aiosqlite is so slower than sqlite3. Why is that?

Details

  • OS: Linux
  • Python version: 3.8
  • aiosqlite version: 0.16.0
  • Can you repro on 'main' branch? yes
  • Can you repro in a clean virtualenv? yes

The problem is not that aiosqlite is slow. It's that aiosqlite is designed to be fast in concurrent usage, and your benchmark is not doing anything concurrently. aiosqlite does all of the database access on a second thread, using queues, but because your second benchmark is just doing X queries sequentially, you're eating the cost of communicating across threads between each query before starting the next, losing any performance benefits of asyncio.

Compare this with a real example of concurrent usage versus synchronous queries, and aiosqlite is four time faster than standard sqlite3:

import time
import aiosqlite
import asyncio
import sqlite3
from contextlib import contextmanager

@contextmanager
def timed(message="timed:"):
    before = time.time()
    yield
    after = time.time()
    print(message, after - before)

COUNT = 10_000

def insert(db, i):
    cursor = db.cursor()
    cursor.execute(f"INSERT INTO numbers(number) VALUES ({i});")
    db.commit()
    cursor.close()

def main():
    with timed(f"sequential {COUNT}"):
        with sqlite3.connect("numbers.db") as db:
            for i in range(COUNT):
                insert(db, i)

async def async_insert(db, i):
    async with db.cursor() as cursor:
        await cursor.execute(f"INSERT INTO numbers(number) VALUES ({i});")
        await db.commit()

async def async_main():
    with timed(f"concurrent {COUNT}"):
        async with aiosqlite.connect('numbers.db') as db:
            coros = [async_insert(db, i) for i in range(COUNT)]
            await asyncio.gather(*coros)

main()
asyncio.run(async_main())
(.venv) jreese@mordin ~/workspace/aiosqlite main± » time python benchmark.py
sequential 10000 2.1702568531036377
concurrent 10000 0.5464680194854736
python benchmark.py  0.71s user 1.43s system 73% cpu 2.894 total
gnat commented

Is it beneficial to call sqlite3 directly for one-off queries?

To clarify this issue, do any performance benefits of aiosqlite appear when you do not have more than one query to run?

Thanks. @amyreese

Generally speaking, if your workload does not involve multiple concurrent I/O jobs (eg, concurrent reads or writes on multiple tables, concurrent web requests, etc), then aiosqlite will not be faster than serial synchronous queries. If you already have other async work being done (via aiohttp, etc), then aiosqlite will allowing non-blocking db queries to happen concurrently with those other workloads. But aiosqlite is also still bound by the limits of normal sqlite, so things like concurrent writes to the same table will still happen in serial due to sqlite's table/transaction locking semantics.

n87 commented

@amyreese
While there is certainly an advantage for async INSERT, the performance of SELECT is a lot worse than sequential.

I just replaced the query text in your benchmark code with

sed 's/INSERT .*;/SELECT * FROM numbers WHERE number = {i};/' benchmark.py > benchmark-select.py

Test:

sqlite3 numbers.db "create table numbers(number int)"
sqlite3 numbers.db "create index idx on numbers(number)"
sqlite3 numbers.db ".import '|seq 10000' numbers"

python benchmark-select.py
sequential 10000 0.1826939582824707
concurrent 10000 2.19964599609375

With slower queries (e.g. unindexed) the difference gets smaller, but still concurrent is twice as slow as sequential.

Is there any advantage or disadvantage to combining aiosqlite with wal2-enabled sqlite?