awslabs/amazon-dynamodb-lock-client

LockCurrentlyNotAvailable with shouldSkipBlockingWait

fejk0 opened this issue ยท 3 comments

fejk0 commented

Greetings gentlemen,

we are facing problem with acquiring lock with 'shouldSkipBlockingWait' set to true;
Find reproducible test bellow.
To reproduce this manually, just kill process while owning lock, start another one and try to acquire it with options as in junit...
Is this bug or just bad usage ?

Version being used is 1.1.0
Thanks

  @Test
  void notExpectedBehaviour() throws Exception{

    final long leaseDuration = 1_000;
    final long heartBeatPeriod = 200;
    final String partition = "super_key_LOCK";

    AcquireLockOptions lockOptions = AcquireLockOptions
        .builder(partition)
        .withAcquireReleasedLocksConsistently(true)
        .withShouldSkipBlockingWait(true)
        .build();

    AmazonDynamoDBLockClient lockClientOne = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    LockItem lockItem = lockClientOne.acquireLock(lockOptions);
    Assertions.assertNotNull(lockItem, "lock acquired");

    // create new lock client which should not be able to acquire lock
    AmazonDynamoDBLockClient lockClientTwo = new AmazonDynamoDBLockClient(
        AmazonDynamoDBLockClientOptions
            .builder(amazonDynamoDB, "table")
            .withPartitionKeyName(Constants.HASH_KEY)
            .withTimeUnit(TimeUnit.MILLISECONDS)
            .withLeaseDuration(leaseDuration)
            .withHeartbeatPeriod(heartBeatPeriod)
            .withCreateHeartbeatBackgroundThread(true)
            .build());

    Thread.sleep(leaseDuration);
    boolean wasThrown = false;
    try {
      lockClientTwo.acquireLock(lockOptions);
    } catch (LockCurrentlyUnavailableException e) {
      wasThrown = true;
    }
    Assertions.assertTrue(wasThrown, "exception - expected behavior");

    // force shutdown
    Field shuttingDown = lockClientOne.getClass().getDeclaredField("shuttingDown");
    shuttingDown.setAccessible(true);
    shuttingDown.set(lockClientOne, true);

    // wait so item gets old
    Thread.sleep(leaseDuration * 3L);

    //  we would expect that lock can be acquired as nobody is sending heartbeats - but this throws exception LockCurrentlyNotAvailable 
    lockItem = lockClientTwo.acquireLock(lockOptions);

  }

Looks like this problem occurs because when the existing lock is retrieved, the timestamp that is used to evaluate expiry in isExpired() is updated. So the test waits leaseDuration * 3L, which would correctly cause isExpired() to return true because the timestamp is T - (T - leaseDuration * 3) > leaseDuration, but when the second acquireLock is called, that timestamp is updated to T (and T - T is not greater than leaseDuration) [ref]. This causes isExpired() to return false. In the normal case, this won't be an issue because the lock will eventually be released. However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

Long story short, seems like a major bug in the library. There appears to be no clear workaround from client-side either.

I've encountered this issue as well. It seems like the "shouldSkipBlockingWait" support is in conflict with how lease expiration works (requires a process to wait for the lock for the lease duration before the lock is considered expired).

It seems we simply can't use "shouldSkipBlockingWait" until this is resolved, unfortunately.

@schen42 I would not agree in calling this a corner case, rather a case which I assumed was supported.

However, in the corner case where a lock expires due to, say, dying process, then it appears that the lock will never be able to be retrieved again so long as shouldSkipBlockingWait is set to true.

I've encountered this issue as well version, 1.2.0.