awslabs/amazon-sqs-java-temporary-queues-client

SQSMessageConsumer doesn't handle errors gracefully

paoloc0 opened this issue · 4 comments

Hi

SQSMessageConsumer has a tight loop in poll(), summarised as:

for (;;) {
   try {
      // receive a message
   } catch (Exception e) {
      exceptionHandler.accept(e);
   }
}

When an exception occurs (for example, kill network connectivity for a java.net.UnknownHostException: sqs.eu-west-1.amazonaws.com), this spins away and generates a huge stack trace every few milliseconds, and consumes all CPU resources very quickly.

Please consider sleeping a second upon exception.

Thanks

I see that I can override the default exception handler via SQSMessageConsumerBuilder's withExceptionHandler. But having a sleep state in SQSQueueUtils.DEFAULT_EXCEPTION_HANDLER would still be a safer default. For instance, there seems to be the same problem in AmazonSQSRequesterClient, and I don't see a way to set an exceptionHandler for it to override the default one (I guess that would be a related feature request).

Definitely not a bad idea to handle this better. There's already a sleep in place for QueueNotFoundExceptions for a similar reason. :) https://github.com/awslabs/amazon-sqs-java-temporary-queues-client/blob/master/src/main/java/com/amazonaws/services/sqs/util/SQSMessageConsumer.java#L126

That's a little more straight-forward, though, since that exception means it's very unlikely that immediately retrying will succeed. I wouldn't want to assume that about all exceptions, so I'm nervous about introducing an unconditional sleep. Perhaps we could hook up a configurable RetryPolicy? https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/retry/RetryPolicy.html

Whatever the mechanism, I think the most important is that the error handling is in the library user's control, which it is for consumers, but not for requesters. If I can override the default, I guess it wouldn't matter to me what the default behaviour was. (But yes, a RetryPolicy does look like a nice way to go.)

Hi, anyone fixing this ?