spring-projects/spring-integration-aws

DynamoDbLock.tryLock() has no chance to acquire an abandoned lock

GarryIV opened this issue · 2 comments

In what version(s) of Spring Integration AWS are you seeing this issue?

>= 2.5.2

Describe the bug

DynamoDbLock.tryLock() has no chance to acquire an abandoned lock no matter how many times you will try. It worked before #205 but with this fix abandoned locks can't re-acquired using DynamoDbLock.tryLock()

To Reproduce

  • create a DynamoDB lock table with a lock (in AWS or in local stack)
  • create a DynamoDbLockRegistry with the params that match the table/lock you created previously and try to lock it using 'tryLock()'
	@Bean
	public DynamoDbLockRegistry dynamoDbLockRegistry(AmazonDynamoDB amazonDynamoDb) {
		var lockClientOptions = dynamoDBLockClientOptionsBuilder(amazonDynamoDb).build();
		return new DynamoDbLockRegistry(new AmazonDynamoDBLockClient(lockClientOptions));
	}

	@Bean
	public CommandLineRunner runner(AmazonDynamoDB amazonDynamoDb, DynamoDbLockRegistry dynamoDbLockRegistry) {
		return args -> {
			createAbandonedLock(amazonDynamoDb);

			logger.info("Trying to acquire the abandoned lock");
			var lock = dynamoDbLockRegistry.obtain(PARTITION_KEY);
			while (!lock.tryLock()) {
				logger.info("Failed to lock");
				Thread.sleep(1000);
			}
			logger.info("Lock is acquired");
			lock.unlock();
			logger.info("Done");
		};
	}

lock.tryLock cycle will never end.

Expected behavior
Lock is acquired if we try more than lock's lease duration time.

Sample
I've created a sample to reproduce it https://github.com/GarryIV/kinesis-lock-bug
The code that reproduces the problem is here

To reproduce it run

docker-compose up -d
./gradlew bootRun

It's expected that "Lock is acquired is printed after 20+ seconds bit you will see endless Failed to lock messages.

NB: We faced this while we were trying to use spring-integration-aws:2.5.2 with the latest spring-cloud-stream-binder-kinesis -org.springframework.cloud:spring-cloud-stream-binder-kinesis:2.2.0.
The binder is trying to acquire a lock with no chance to success. Only way to start it is to delete lock record manually.

I also included a sample use case for it, pls check the readme.

Thank you for sharing this with us!
Unfortunately I cannot start your application:

  1. Even if you marked it with a @Primary in the LocalStackConfiguration, the other bean is still created and it fails with this error:
Caused by: com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.
	at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462) ~[aws-java-sdk-core-1.12.266.jar:na]
	at com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:424) ~[aws-java-sdk-core-1.12.266.jar:na]
	at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46) ~[aws-java-sdk-core-1.12.266.jar:na]
	at com.example.lockbug.DynamoDbLockBugConfiguration.amazonDynamoDB(DynamoDbLockBugConfiguration.java:40) ~[main/:na]
  1. When I remove that bean in the DynamoDbLockBugConfiguration, I fail with this error:

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'amazonS3': Invocation of init method failed; nested exception is com.amazonaws.SdkClientException: Unable to load region information from any provider in the chain
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1804) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:936) ~[spring-beans-5.3.24.jar:5.3.24]
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918) ~[spring-context-5.3.24.jar:5.3.24]
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) ~[spring-context-5.3.24.jar:5.3.24]
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:731) ~[spring-boot-2.7.7.jar:2.7.7]
	at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:408) ~[spring-boot-2.7.7.jar:2.7.7]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:307) ~[spring-boot-2.7.7.jar:2.7.7]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1303) ~[spring-boot-2.7.7.jar:2.7.7]
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:1292) ~[spring-boot-2.7.7.jar:2.7.7]
	at com.example.lockbug.LockBugApplication.main(LockBugApplication.java:10) ~[main/:na]
Caused by: com.amazonaws.SdkClientException: Unable to load region information from any provider in the chain
	at com.amazonaws.regions.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:59) ~[aws-java-sdk-core-1.12.266.jar:na]
	at io.awspring.cloud.core.region.DefaultAwsRegionProviderChainDelegate.getRegion(DefaultAwsRegionProviderChainDelegate.java:37) ~[spring-cloud-aws-core-2.4.2.jar:2.4.2]
	at io.awspring.cloud.core.config.AmazonWebserviceClientFactoryBean.createInstance(AmazonWebserviceClientFactoryBean.java:130) ~[spring-cloud-aws-core-2.4.2.jar:2.4.2]
	at io.awspring.cloud.core.config.AmazonWebserviceClientFactoryBean.createInstance(AmazonWebserviceClientFactoryBean.java:52) ~[spring-cloud-aws-core-2.4.2.jar:2.4.2]
	at org.springframework.beans.factory.config.AbstractFactoryBean.afterPropertiesSet(AbstractFactoryBean.java:142) ~[spring-beans-5.3.24.jar:5.3.24]

Apparently some other AWS services are trying to be auto-configured and you probably have some AWS SDK configuration locally in the ~/.aws/credentials.

Anyway I have a code from you so I'll try to reproduce it and play with from JUnit test where I use Testcontainers for LocalStack instead.

Will come back with some finding a bit later...

OK. So, I see (or at least guess) what is going on.
This logic in the AmazonDynamoDBLockClient has no sense:

if (options.shouldSkipBlockingWait() && existingLock.isPresent() && !existingLock.get().isExpired()) {

That isExpired() is always true:

return LockClientUtils.INSTANCE.millisecondTime() - this.lookupTime.get() > this.leaseDuration.get();

where that this.lookupTime is set exactly as a LockClientUtils.INSTANCE.millisecondTime() in the getLockFromDynamoDB(getLockOptions) method call from the beginning of the while (true) loop in the acquireLock().
So, no matter how much we try to wait in between lock.tryLock(), that isExpired() is always false when we do shouldSkipBlockingWait().

I will try to find some workaround (at least restore the previous behavior), but it feel like a bug in the DynamoDB Lock Client library by itself.
There is no reasonable lookupTime to make an item to expire in time for our call when we don't want to wait.
And therefore we end up with an impression that it never expires. 🤷