formancehq/ledger

Duplicate key violation on id when handling concurrent requests

Closed this issue · 10 comments

Describe the bug
I noticed when I benchmarked ledger, I sometimes would get a 200 response with no content in the response body, but the ledger logs would indicate: internal errors executing script: conflict error on reference.

After doing some digging, the true cause is duplicate key value violates unique constraint "transactions_id_key" when inserting into transactions table (possibly same issue could be present with other tables too).

The issue happens becomes you seem to be incrementing the ID in code, and not by using a db auto increment feature. Since there is no locking around this this kind of conflict is going to arise sooner or later when handling concurrent requests.

To Reproduce
Steps to reproduce the behavior:
Just have a script/benchmarking tool do a few requests to ledger at the same time.

For instance, I use Bombardier:

bombardier -c 50 -m POST -f bench.json -d 10s http://localhost:3068/example/script

Where bug.json can be any script posting, but for instance:

{"plain":"vars {\n  monetary $money_authorized\n  account $destination\n}\n\nsend $money_authorized (\n  source = @world\n  // @user:c36a3385-d94e-485d-9ccd-3682507d0fc2:authorized:b394c68d-0e4b-4e09-a634-3449d054edd5\n  destination = 
$destination\n)","vars":{"money_authorized":{"amount":2,"asset":"EUR/2"},"destination":"user:bf11c59beaca47cda9d477cd62181e55:authorized:47aa2ce0db61498bb02d9f0ca2728425"}}

Expected behavior
I expect a bunch of things:

  1. Ledger should be able to handle this case, worst case by locking when incrementing the id. But perhaps a better solution would be to use UUIDs as IDs in the database OR use the database auto increment feature
  2. For such kind of fatal issues that do still occur, he HTTP request should not yield a 200 OK response with an empty body, but probably a 500 error

Environment:

  • Numary Version 1.8.0

@edwardmp Thanks a lot for the bug report!
We will investigate and prepare a fix for this one

@edwardmp are you using redis in your locking strategy?

Check this out @thiagolinsx.

@lucasfcnunes I'm not using Redis currently. Would that prevent this issue?

Indeed, Redis allows to have a lock shared between several instances of the Ledger. By default, we store the lock in the memory.

You can see the different configurations here

However, in case you have only one instance of the Ledger, you should not need a lock in redis.

I currently only have 1 instance.

But would this kind of lock really work in the scenario? I'm my real system (i.e. not the benchmark) the accounts I post to ledger are unique, so I think this problem would still occur

Today, the lock blocks the totality of the transactions in a ledger (we have started a reflection for improve this one).

After some time, it seems that there is a regression between v.1.7.x and v1.8.0 on the lock management.
We will come back soon with more information and a fix.

@edwardmp Hi! Well, there is a lock on the commit process. We are not using database features like auto increments because we need to chain transactions ourself.
Each transaction has a hash, calculated using the transaction and the previous transaction. So, when calculating the hash, we actually need an id. So, we could not delegate this part to the database.

@gfyrag gotcha. But if you already lock on the commit process, I'm not sure why I'm hitting this error though

Unfortunately, the version 1.8 introduce a bug on this.
The lock is still there but sql transactions are committed after lock release...
We work on its restoration.

I can confirm this seems to be solved in v1.8.1-rc.2