LiveRamp/jack

Upgrade commons-pool2 to 2.4.3 to prevent error when creating multiple transactor instances

tuliren opened this issue · 2 comments

The EvictionTimer in commons-pools version 2.4.2 has the following cancel method:

static synchronized void cancel(TimerTask task) {
  task.cancel();
  _usageCount--;
  if (_usageCount == 0) {
    _timer.cancel();
    _timer = null;
  }
}

It does not wait for the timer#cancel call to finish. This will lead to concurrency issue when users try to create multiple transactors in a very short period of time. What happens is:

  1. The GenericObjectPool constructor calls the startEvictor methods twice when we do need object eviction. The second time it is called, it will first cancel the initialized evictor and create a new one.
final void startEvictor(long delay) {
  synchronized (evictionLock) {
    if (null != evictor) {
      EvictionTimer.cancel(evictor);
      evictor = null;
      evictionIterator = null;
    }
    if (delay > 0) {
      evictor = new Evictor();
      EvictionTimer.schedule(evictor, delay, delay);
    }
  }
}
  1. When multiple transactors are created, the first transactor construction creates the evictor, cancels it, and creates it again.
  2. The second transactor construction will do the same. However, the cancellation of the timer started in the construction of the first transactor may not have completed. When that happens, and the EvicationTimer#schedule method is called, it will throw the following exception:
java.lang.IllegalStateException: Timer already cancelled.
        at java.util.Timer.sched(Timer.java:397)
        at java.util.Timer.schedule(Timer.java:248)
        at org.apache.commons.pool2.impl.EvictionTimer.schedule(EvictionTimer.java:76)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.startEvictor(BaseGenericObjectPool.java:695)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.setTimeBetweenEvictionRunsMillis(BaseGenericObjectPool.java:458)
        at org.apache.commons.pool2.impl.GenericObjectPool.setConfig(GenericObjectPool.java:317)
        at org.apache.commons.pool2.impl.GenericObjectPool.<init>(GenericObjectPool.java:117)
        at com.rapleaf.jack.transaction.DbPoolManager.<init>(DbPoolManager.java:73)

A relevant issue is reported in POOL-315 (this issue is based on updated EvictionTimer so the code is not exactly the same).

This is fixed in version 2.4.3 by that the EvictionTimer#cancel uses an executor service to run everything and users are allowed to specify a timeout value for the executor, so that the method will only return after the cancellation has completed. However, 2.4.3 has not been released yet.

For now, just don't create multiple transactors in very short period of time.

Thanks @Darge for reporting this issue.

We should not use commons pool2 2.4.3 because it introduced ScheduledThreadPoolExecutor in the EvictionTimer, and it will cause the workflow to hung indefinitely. Basically don't upgrade to any version that has this class for now, unless any new version explicitly address this issue.

Here is the commit that changed Timer to ScheduledThreadPoolExecutor in EvictionTimer:
apache/commons-pool@4a20cdc#diff-38e254894b87bdf9a1758778c7ffd50f It exists since POOL_2.4.3-RC1.