nsidnev/fastapi-realworld-example-app

scalability issue

mayli opened this issue · 6 comments

mayli commented

fastapi being fast and scalable, but it seems this example won't handle readworld benchmark. To repro, run this and benchmark with wrk -t1 -c100 -d10s --latency http://localhost:8000/api/articles, and any api accessing db is stuck :(

Maybe we could increase the default MAX_CONNECTIONS_COUNT to a higher value.

Hi!

Yes, this app uses the default values​​(10 for the min and max count of connections) from the asyncpg driver, since the real amount of connections for the pool should be configured after analyzing the requirements, hardware and other parameters. But, as you already mentioned, there are environment variables here that can change this behavior if you need it. Personally, I don’t think that this example application's config should use values ​​higher then defaults, because the default values ​​are enough for use in local development.

In real production, I assume that you will have a load balancer (for example, traefik or nginx) that will properly route your requests to free instances of the application (if there will be more than one instance) with right pool size, and you will also have properly configured database (with the right amount of connections and other settings to increase performance).

mayli commented

Thanks for quick reply. But I also notice the fastapi is not releasing connection to the pool, for example, after wrk -t1 -c100 on a MAX_CONNECTIONS_COUNT=10, all APIs accessing db will stuck for quite a while. I am expecting once I kill the wrk, the connection should be released back to pool. But it seems doesn't.

The connection to the pool is released after request processing. As soon as wrk sent the request, it goes to uvicorn, uvicorn then waits for execution from the ASGI application (in this case FastAPI) and then it tries to send data back to the client, sees that the client has disconnected and discards the result. But until FastAPI gives the response back to the uvicorn(actually to the Starlette, that will write result back throught a callback), the connection won't return to the pool.

Here is dependency, that acquires connection from pool: link

mayli commented

The connection to the pool is released after request processing.

Does that also means the conn is released once wrk exits? In my experiment, even after wrk exit, the server stays unresponsive for a while. I am expecting the async magic could be more scalable than thread/process based approach, but it seems we still stuck at the bottleneck of MAX_CONNECTIONS_COUNT, and a simple wrk -c MAX_CONNECTIONS_COUNT+1 could easily DDOS the API server :( .

Okay, I finally realized what was wrong. This is a bug in the dependency. You're right, the connection doesn’t return to the pool. When a repo for entities is created, it acquires a connection from the pool. But some routes may depend on several repos. And when wrk sent requests, there was a race to get the connection in the pool that completely blocked the entire application. This can be fixed by changing the dependency that creates the repo, providing the connection as a separate dependency:

def _get_db_pool(request: Request) -> Pool:
    return request.app.state.pool


async def _get_connection_from_pool(pool: Pool = Depends(_get_db_pool)) -> AsyncGenerator[Connection, None]:
    async with pool.acquire() as conn:
        yield conn


def get_repository(repo_type: Type[BaseRepository]) -> Callable:  # type: ignore
    def _get_repo(
        conn: Connection = Depends(_get_connection_from_pool),
    ) -> BaseRepository:
        return repo_type(conn)

    return _get_repo

The dependency _get_connection_from_pool will be cached by FastAPI (when processing a single request), and all wrk requests should be processed with default values for the pool.

Thanks for the help with this! I'll fix it soon.

Just merged PR with fix. Now it should work right!