M1ha-Shvn/django-pg-bulk-update

Race condition in bulk_update_or_create can lead to IntegrityError

M1ha-Shvn opened this issue · 9 comments

Sometimes race condition in bulk_update_or_create can cause IntegrityError: duplicate key value violates unique constraint:

  1. Thread1 selects values, doesn't find anything
  2. Thread2 Inserts values with search key
  3. Thread1 fails to insert data

What can be done:

  1. Use PostgreSQL 9.5+ INSERT ... ON CONFLICT ... in order to update values in such situations
  2. PostgreSQL 9.4 can't be fixed in such a case (only repeat whole update).
  3. If there is no constraint in DB, duplicates may be created. This should be mentioned in docs.

This post may be useful for PostgreSQL 9.4

Django currently doesn't support on_conflict (thread) so I'll need to implement custom bulk_create instead of standard one...

This bug would be solved with #5

Release 2.0.0

Has this bug been resolved, or the package documentation is outdated ?

Hi. This bug can not be solved for postgresql 9.4 and less. For PostgreSQL 9.5+ it is not actual, as insert .. on conflict statement is used. It can occur only if you manually use key_is_unique=False

I am planning on using key_is_unique=False and also I have postgres 14. Is there any known extra work arounds for the race condition ?

The only correct and efficient way is to use INSERT ... ON CONFLICT statement as it controls duplicates creation on PostgreSQL side (key_is_unique=True). But it requires unique constraint on table columns you search by (that is how existence control works). When you set key_is_unique=False you downgrade library code to my first approach of implementing bulk_update_or_create in 3 query transaction: search by key query -> create not found -> update found. It also always works for PostgreSQL 9.4 and less, as they do not support ON CONFLICT.
This flow works well for not highly loaded tables with little concurrency, but can throw errors if other thread created record after existance check and before creation (see example in issue). The only way here is to lock all table for write exclusively, but from my point of view, it is not efficient at all, especially if we consider highly loaded tables where this problem occurs

Understood and Thanks