jackc/pgx

`fatal error: concurrent map writes` in `EnumCodec.lookupAndCacheString`

jamesroutley opened this issue · 6 comments

Describe the bug

We've been seeing fatal error: concurrent map writes errors when running queries using PGX that queries data from tables that have a column with an enum type.

Here's an example stack trace, which should be read bottom to top:

image

I think this is because there's nothing in that object preventing concurrent map writes. I've had a stab at a fix in #2088

To Reproduce

I haven't created a minimal reproduction unfortunately (sorry!).

Expected behavior

I would expect to be able to concurrently query tables which contain an uncached enum value

Actual behavior

We get a fatal error: concurrent map writes

Version

  • Go: $ go version -> go version go1.22.5 darwin/arm64
  • PostgreSQL: $ psql --no-psqlrc --tuples-only -c 'select version()' -> PostgreSQL 14.9 on aarch64-unknown-linux-gnu, compiled by aarch64-unknown-linux-gnu-gcc (GCC) 9.5.0, 64-bit
  • pgx: $ grep 'github.com/jackc/pgx/v[0-9]' go.mod -> v5.5.5

Additional context
n/a

EnumCodec (and any Codec for that matter) is not expected to be concurrency safe. How is this being used?

Hi, sorry for the long delay here.

We're getting this error when reading from a table that has a column who's type is a custom enum (created with create type T as enum (X, Y, Z). Attached is a slightly longer stack trace showing that the error is originating from a DB read (ReadPaymentInstrument just does a select * from table where ...), which calls pgxpool.Pool.QueryRow.

From the trace, it looks like reading from a table that has a custom enum column calls EnumCodec.DecodeValue, which calls the non-concurrency-safe method lookupAndCacheString. Our service performs multiple reads concurrently - do you know of anything between QueryRow and lookupAndCacheString that would prevent concurrent calling of the latter?

Thanks

image

Every connection gets its own type map and set of codecs. EnumCodec is not concurrency safe -- but I wouldn't have expected it to be possible for it to be used concurrently. 🤔

Okay thanks that's interesting information. I'll dig a bit more into our setup and see if I can reproduce a minimal example