eclipse-vertx/vertx-sql-client

Memory leak in Aurora Postgres since 3.9.1

Closed this issue · 4 comments

Version

3.9.1, 4.5.7

Context

Our app has been on vertx-pg-client 3.9.0 for a while until recently when we migrated to version 4.5.7.

Shortly after the upgrade we noticed our AWS Aurora PG cluster is running out of freeable memory and rebooting. It is directly correlated to the load on the service - during peak times DB can run out of memory (60GB) in 45 minutes. We employed the maxLifetime parameter for the pool, which helps but is not ideal as it causes latency spikes. We prewarm the pool on deployment so all the connections expire at about the same time.

On 3.9.0, there was no issue with DB memory consumption. We rolled back the upgrade to confirm this.
On 3.9.1, confirmed the issue is present.
This thread has a report of the same issue on 3.9.13 as well as 4.x branch.

Do you have a reproducer?

I don't have a reproducer; this might be AWS/aurora-specific. I can test potential solutions if that can help.

Extra

Pretty much all our queries are preparedQuery + execute or executeBatch. We also use standalone prepare with a cursor for paginated reads. Pool configuration: https://gist.github.com/al-kudryavtsev/3e6eeb3cfd200afc66df5a12932bba25

Have you tried using DISCARD ALL as suggested in the group discussion?

It would help to have some details about what is consuming memory, can you get that from your DBA?

In the 3.9.1 release notes, there is a link to #577
If you use prepare, can you make sure the statement is closed properly?

Thanks for looking into this!

Haven't tried DISCARD ALL; it might have a performance impact as it will flush cached query plans when connection is returned to the pool. Might try this if other options won't work.

If you use prepare, can you make sure the statement is closed properly?

The statement wasn't closed indeed. We were doing prepare(sql).coAwait().cursor(args) and then reading the cursor until it is exhausted, but never closed the result of prepare(sql). I changed this to:

flow<Row> {
    val preparedStatement = prepare(sql).coAwait()
    try {
        val cursor = preparedStatement.cursor(args)
        do cursor.read(pageSize).coAwait().forEach { emit(it) }
        while (cursor.hasMore())
    } finally {
        preparedStatement.close().coAwait()
    }
}

It looks promising so far, will see if we can keep connections open indefinitely. Based on the doc, I'm assuming that cursor doesn't have to be closed since we read it until the end.

Good news, keep us posted please.

Fully removed max lifetime today - DB memory looks good. Thanks for your help!