madelson/DistributedLock

Support for semaphore

clement911 opened this issue · 10 comments

This may be a bit out there for a "lightweight" library but have you thought of implementing a distributed counting semaphore?

In my scenario, I have a sql azure database with many customers. Customers can have a lot of data.
The .net code is hosted in several azure web instances.

I regularly want to process every customer data. However, I don't want to process all customers at the same time because it would overload the database.
I also don't want to process each customer serially (1 by 1) because it would take too long.
Ideally I would use a distributed semaphore with a count of N to process N customers at a time...

It could be built on top of SqlDistributedLock and I wonder if you've come across this scenario before?
Do you see that fitting within the vision of your library?

Any thoughts on what the implementation would look like? I can imagine using a global (##) temp table to store the counter state. What's less clear is how to implement the wait that happens when you try to remove the last ticket from the semaphore but can't. Obviously this could be done via sleeping with WAITFOR, but is there a cleaner mechanism by which anyone returning a ticket would wake up a waiter?

Hmm I wasn't thinking about the overload with a timeout because I usually want to acquire the lock immediately or not at all. But I think we can handle that...

So, here are my thoughts on how to implement it.

For the user, it could look like this:

var semaphore = new SqlDistributedSemaphore("MyLock", connectionString, maxCount: 3);

using (semaphore.Acquire())
{
    // this block of code is protected by the lock!
}

Internally, the library will try to acquire a SqlDistributedLock with a name of "MyLock" + i.
Pseudo code:

for (int i = 1; i <= maxCount; i++)
{
     exclusiveLock = new SqlDistributed(lockName + i.ToString()).Aqcuire(...);
     if (exclusiveLock != null)
          return exclusiveLock;
}
return null;

If the user provided a timeout I guess we could try to acquire all locks in parallel, using the timeout provided. Of course the code would need to make sure that if multiple locks were acquired, all but 1 are released immediately... Also, if the connection is owned by the user that could be tricky because as far as I know SqlConnection is not thread safe.
Alternatively we could try each lock serially with a timeout of (timeout / maxCount). Not very elegant...

I realize I'm leaving out a lot of details here but I hope this makes sense from a high level view...

This post describes a similar technique, although the loop happens in the sql code...

Thanks @clement911 I can see that approach working well for many scenarios. Two things that would worry me about using this for a general-purpose API:

  • Were someone to create a semaphore with hundreds or thousands of tickets, the runtime for acquire scales very poorly
  • This works well with TryAcquire() semantics or a timeout of zero. However, for more traditional approaches where you want to wait to acquire the semaphore, we basically end up sleeping in a loop. This means executing a ton of additional queries as we wait.

Given these caveats, I don't think it makes sense to use the algorithm for a general-purpose semaphore API, especially since as you point out it only takes a few lines to implement it on to of the existing APIs. I'll keep thinking about whether there's a way to work around these limitations.

Those are fair points. Thank you for your feedback and feel free to close the issue if you don't see it making it into the library.

What about using a global temp table with three columns LockName, LockHolder and optionally an expiry. Then you could insert into this table in an atomic manner to aquire the lock, and delete to release it. insert should be blocked when the rowcount reaches the max clients. Something like the below. Should scale pretty well I expect.

CREATE TABLE ##locks(Lockname varchar(20), LockHolder varchar(20), Expiry datetime2)

-- Try aquire the lock
DECLARE @LockName varchar(20) = 'MySharedLock'
DECLARE @LockHolder varchar(20) = 'Me'
DECLARE @MaxLocks int = 3

INSERT INTO ##locks(LockName, LockHolder, Expiry) 
SELECT @LockName, @LockHolder, DATEADD(hour,1,getdate())
WHERE NOT EXISTS (
	SELECT 1
	FROM ##locks
	WHERE LockName = @LockName 
		AND Expiry > GETDATE()
	GROUP BY LockName
	HAVING count(*) = @MaxLocks
)

IF(@@ROWCOUNT = 1)
	PRINT 'Lock Aquired'
ELSE 
	PRINT 'Lock denied'

-- and later release the lock
DELETE FROM ##locks
WHERE LockName = @LockName and LockHolder = @LockHolder

Hi @alastairtree thanks for commenting.

I definitely think that a global temporary table would be part of the solution.

One of the big challenges, however, is implementing blocking behavior (Acquire vs. TryAcquire). In your example, failure to acquire the lock returns lock denied immediately, but in many cases you want to have your process block until it gains entry (or a timeout elapses). While this can be done by spinning in a busy wait, it's not ideal since it consumes resources for each waiter and doesn't guarantee any sort of fairness (one spinning thread could repeatedly get unlucky).

My current thinking on this would be something like the following:

if (try acquire one of the N sempahore slots) { return success; }

acquire waiter lock
{
     while (timeout has not elapsed) 
     {
          if (try acquire one of the N sempahore slots) { return success; }
     }
}

return failure

The idea here is that the waiter lock (a regular mutex lock) allows most waiters to block while a single waiter goes through the busy wait cycle. Since waiters queue on the wait lock, that means that the semaphore will be at least as fair as normal SQL server mutex. Still not perfect (we have a busy wait), but that's the best I have so far.

This is available as of version 1.4

Wooo thanks for that! That looks very solid!

Looks great! Can you comment on how you managed the fairness? Was it as above?

@alastairtree the algorithm I ended up with is essentially this (implemented in SQL):

// the idea of the preamble is to make sure we never hit a case where a lock acquisition with a timeout
// (e. g. TryAcquire) returns false due to timing out on the busy wait lock (below) even though there
// were tickets available
preambleLock = name + '_preamble'
lock (preambleLock) // no blocking inside this; so we can wait for it regardless of timeout
{
     waiterCount = SELECT COUNT(*) FROM tempdb.sys.tables WHERE name like '##' + name + 'marker%';
     var markerTable = '##' + name + 'marker' + GUID;
     CREATE TABLE markerTable;

     if (waiterCount < maxCount) { // space available; just take it
         for (i = 0; i < maxCount; ++i) {
             ticketLock = name + i;
             if (tryAcquire(ticketLock)) { return (ticketLock, markerTable); }
         }
         throw "should have acquired";
     }
}

// the idea of the busy wait lock is that threads queue up here to do a spin wait.
// This both ensures fairness (one spinner at a time) and should prevent too much CPU
// load with many waiters (all but one blocked)
busyWaitLock = name + '_busyWait'
if (tryLock(busyWaitLock, timeout)) {
    while (!timeout expired) {
           for (var i = 0; i < maxCount; ++i) {
                 ticketLock = name + i;
                 if (tryAcquire(ticketLock)) { return (ticketLock, markerTable); }
           }
    }
}

// failed acquire
DROP TABLE markerTable;

To clean up, you just drop the marker temp table along with the ticket lock you took. The full code is in https://github.com/madelson/DistributedLock/blob/master/DistributedLock/Sql/SqlSemaphore.cs#L347

There are some additional complexities there dealing with cancellation and lock reentrance.

Give it a try and let me know how it goes.