Include on/scope_components into AdvisoryLock Name

Question

Include on/scope_components into AdvisoryLock Name

Closed this issue 9 days ago · 15 comments

Hi, recently we started using the Positioning gem, and it works perfectly, except sometimes SELECT pg_advisory_lock(?) fails due to Postgres statement_timeout (5 minutes). I could not reproduce it, but I guess it is due to the same lock_name between requests even when they work with records from different scopes (positioned on: [...])

Is there any reason why the lock_name is the same for a table? Is it possible to include scope_components here to allow process clients requests simultaneously instead of consequentially

Thank you

brendon commented 9 days ago

Nice :)

Answer 1 · 2024-10-07T05:55:27.000Z

Hi @palexvs, I'm currently away on annual leave and can look at this a bit deeper when I return next week. But to answer the question quickly, we just lock down to the column name as the positioning gem can only have one scope declaration per database positioning column and positioning operations can span two scopes as an item is removed from one list and added to another. I guess this could be optimised a bit more but I'd have to think this through.

Would be so good to be able to reproduce it in a test too :) Let me know if that situation changes :)

Answer 2 · 2024-10-07T22:53:37.000Z

I would second this, as right now, given a high enough level of concurrency given the current advisory lock setup, will result in requests that won't even interact at all, end up deadlocking each other.
As an example, if many people alter the same model from a multi tenent perspective, the current advisory lock will lock essentially the entire table waiting for each sequential change to happen in order.

Answer 3 · 2024-10-08T00:21:17.000Z

I'm currently away on annual leave and can look at this a bit deeper when I return next week.

no worries, thank you for letting me know!

Would be so good to be able to reproduce it in a test too :) Let me know if that situation changes :)

Unfortunately, I could not reproduce it, but I had several times on production

But to answer the question quickly, we just lock down to the column name as the positioning gem can only have one scope declaration per database positioning column and positioning operations can span two scopes as an item is removed from one list and added to another. I guess this could be optimised a bit more but I'd have to think this through.

class User < ApplicationRecord
  has_many :tasks
end

class Task < ApplicationRecord
  belongs_to :user
  positioned on: :user
end

From my understanding, when two different users update/create their tasks simultaneously, advisory_lock uses the same key/id for both requests. As a result, one request waits till another one is finished. And I suspect sometimes it can lead to a deadlocks: one query waits for advisory_lock when the second waits something was locked by the first query

Answer 4 · 2024-10-08T02:31:36.000Z

I can say that from attempting to resolve this (production is fun to debug these sorts of things), but I was looking at some sort of tenant_id that could be more specific to the item being positioned. That would at least lessen the scope here, but yeah there needs to be a more granular lock as this one effectively locks the whole table, regardless of if there are unrelated changes to position or even unrelated to the scope relevant to the item being positioned.

Answer 5 · 2024-10-11T03:54:48.000Z

Yea, the more I look at this, the more flaws I can see in what I've come up with so far. To make it more granular will require embedding the advisory lock much deeper in the mechanisms code and then creating a way to globally release any relevant locks on commit or rollback. Will have a stab at it and see where I get to.

Answer 6 · 2024-10-11T20:09:28.000Z

Now when I scope the lock right down to the position scope I'm getting deadlocks again. Quite strange. Will keep plugging away.

Answer 7 · 2024-10-12T09:44:57.000Z

Looking at this further, perhaps we could do away with an advisory lock altogether and just lock all the scopes that are belongs_to relations? In my testing this seems to avoid deadlocks and keeps the locking scoped.

Please let me know your thoughts on this. In the case where there is no scope then we'd need to lock the entire table.

Answer 8 · 2024-10-13T07:30:30.000Z

Please check out this branch: https://github.com/brendon/positioning/tree/advisory-lock

And comment in this PR on anything specifically relating to the change to scope locks. My particular concern is what to do when there is no scope. Currently I'm locking all the rows in the table but apparently this won't prevent new rows being added during this process. I guess it might still work because other processes trying to add new records will need to obtain a lock as they do their positioning adjustments. I wonder if it's enough to just lock the first record in the table?

Answer 9 · 2024-10-18T05:51:37.000Z

thank you @brendon ! Sorry, was busy, will try this weekend

Answer 10 · 2024-10-20T21:22:59.000Z

Thanks @palexvs. Hope it went well :)

Answer 11 · 2024-10-20T22:56:23.000Z

I can say that, under sort of maybe mid level load, I did away with the advisory locks entirely and am relying on a unique index instead. This seems to work well enough for my purposes. Personally, I'm leaning towards letting sleeping dogs lie for my implementation as the locking as it was ended up with production outages for me.

Answer 12 · 2024-10-20T23:14:37.000Z

Thanks @ragingdave. I'll probably look to do away with advisory lock altogether and replace it with what's in that fork. Since it adds row locking of various kinds I'll make it a minor release as it could potentially cause more problems. If you want to have a go running that on your production environment, let me know your results. I'm done with it apart from trying to get the tests to work with sqlite. https://github.com/brendon/positioning/tree/advisory-lock

Answer 13 · 2024-11-13T19:36:24.000Z

Advisory Lock is gone so I'll close this :)

Answer 14 · 2024-11-13T20:09:18.000Z

@brendon thank you!
We have upgraded on v0.4.2 a few days ago and it's working so far