atom/teletype-server

Is using a UUID as a primary key in Postgres a performance hazard?

nathansobo opened this issue · 4 comments

I recall using UUID as primary keys in MySQL as a younger engineer, only to learn that it was a disaster for write performance since tables are ordered by primary key by default on disk and using UUIDs created a random ordering among all records instead of an increasing one. Should we use an integer primary key on the portals table and add a secondary index for the id column instead? Maybe Postgres is smarter than MySQL was back in the day and this doesn't matter.

Good point. I would assume that using a UUID as a primary key would be problematic if we did range queries (or similar kinds of queries) where on-disk data locality is important in terms of performance. But I see what you're saying about write performance: what if, when writing on disk, the database isn't able to append to the data file because of the random nature of UUIDs?

I think right now all we do is lookups by portal id on that table, so I think reads are probably fine. Maybe we should stress test the current postgres instance and see how it handles lots and lots of writes?

Maybe just searching around the web about non-continuous primary keys in Postgres would be a good idea. I'm not as concerned with reads as I am with writes.

@as-cii You had a link that you thought demonstrated this is a non-issue. You mind posting it and closing this out when you get a chance?

According to https://stackoverflow.com/questions/34230208/uuid-primary-key-in-postgres-what-insert-performance-impact, using a UUID as a primary key doesn't have any effect on performance, because Postgresql doesn't use clustered indices by default (nor it's able to maintain ordering on future insertions after clustering an index, see https://stackoverflow.com/a/4796677/464250).

The only extra cost is due to the tuple being larger, but that would be a problem anyways if we move the UUID into another column.