redis/lettuce

ClusterCommand error completeExceptionally cause OOM

Wuuqy opened this issue · 5 comments

Bug Report

Class ClusterCommand completeExceptionally function cannot notice onComplete function to run, maybe cause many timeout task and finally OOM.

Current Behavior

Stack trace
// your stack trace here;

Input Code

Input Code
// ClusterCommand.java
public boolean completeExceptionally(Throwable ex) {
        // why not super.completeExceptionally(ex)? 
        // now it may cause ClusterCommand.onComplete function not invoke
        // cause CommandExpiryWriter.timer hold many elements and finally OOM if timeout is big
        boolean result = command.completeExceptionally(ex);
        completed = true;
        return result;
}

// CommandExpiryWriter.java
private void potentiallyExpire(RedisCommand<?, ?, ?> command, ScheduledExecutorService executors) {
        long timeout = applyConnectionTimeout ? this.timeout : source.getTimeout(command);
        if (timeout <= 0) {
            return;
        }
        Timeout commandTimeout = timer.newTimeout(t -> {
            if (!command.isDone()) {
                executors.submit(() -> command.completeExceptionally(
                        ExceptionFactory.createTimeoutException(Duration.ofNanos(timeUnit.toNanos(timeout)))));

            }
        }, timeout, timeUnit);

        if (command instanceof CompleteableCommand) {
            // currently can not RUN when ClusterCommand invoke completeExceptionally, so timeout task cannot cancel
            ((CompleteableCommand<?>) command).onComplete((o, o2) -> commandTimeout.cancel());
        }

    }

Expected behavior/code

Environment

  • Lettuce version(s): [e.g. Main]
  • Redis version: [e.g. ALL]

Possible Solution

// ClusterCommand.java
  public boolean completeExceptionally(Throwable ex) {
      boolean result = super.completeExceptionally(ex);
      completed = true;
      return result;
  }

Additional context

Hey @Wuuqy ,

I am having trouble understanding the issue you are describing. Could you provide a way for me to reproduce it?

            // currently can not RUN when ClusterCommand invoke completeExceptionally, so timeout task cannot cancel
            ((CompleteableCommand<?>) command).onComplete((o, o2) -> commandTimeout.cancel());

This got me really confused. TimeoutTask does should not call this, there are two options here:

  1. The timer counts down and if the command is not yet done it runs .completeExceptionally()
    or
  2. The command is executed (and onComplete() is called) which cancels the timer

I do - however - feel we might want to call the super-method, because it seems to me we are not guarded against #1576

I do - however - feel we might want to call the super-method, because it seems to me we are not guarded against #1576

@mp911de, what do you think?

command.completeExceptionally should be replaced with a call to super.completeExceptionally(…). Probably an oversight when migrating to CommandWrapper as already the previous code used command.completeExceptionally and not super.…

command.completeExceptionally should be replaced with a call to super.completeExceptionally(…). Probably an oversight when migrating to CommandWrapper as already the previous code used command.completeExceptionally and not super.…

Thanks for the confirmation @mp911de and thanks for the report @Wuuqy !

Hi @tishun @mp911de I created a pr for this, please help review 🙇