scalar-labs/btm

Recovery fails to commit dangling transactions if there are no in-flight transactions

maxant opened this issue · 3 comments

Test Case (BTM 2.1.4):

  • kill database before commit command can be sent to it (using debugger).
  • let thread continue to run
  • restart database
  • during recovery database reports transaction ID which is incomplete
  • Recoverer calls TransactionManagerServices.getTransactionManager().getOldestInFlightTransactionTimestamp() because transaction manager is running (line 137)
  • There are no in-flight transactions and so the TM returns Long.MIN_VALUE
  • Because the oldestInFlightTransactionTimestamp is now so low, the code on line 289 skips the dangling transaction which still needs to be committed.

The result is that the resource never has "commit" called again, unless BTM is restarted, or randomly a transaction is in-flight during recovery.

I think a possible patch for this could be to change the value which BitronixTransactionManager#getOldestInFlightTransactionTimestamp() returns when there are no in-flight transactions. Instead of returning Long.MIN_VALUE, it should return Long.MAX_VALUE, because you want to recovery all dangling transactions.

public long getOldestInFlightTransactionTimestamp() {
    if (inFlightTransactions.isEmpty()) {
        if (log.isDebugEnabled()) log.debug("oldest in-flight transaction's timestamp: " + Long.MIN_VALUE);
        return Long.MIN_VALUE; // CHANGE THIS VALUE?
    }

This specific check in BitronixTransactionManager.getOldestInFlightTransactionTimestamp() that makes it return Long.MIN_VALUE was added for a purpose. It took me time to remember why, but I think I found the original reason back.

The Recoverer is the only consumer of this API, and it uses this timestamp to avoid stepping on the toes of the 2PC engine while it's still running on in-flight transactions. Basically, any transaction with a timestamp older than or equal to the oldest in-flight transaction's timestamp is considered in-flight and ignored by the recoverer.

There's of course a special case: when there is no in-flight transaction. If we blindly returned Long.MAX_VALUE then all transaction would become a target for the recoverer, including the ones that started right after getOldestInFlightTransactionTimestamp() is read by the recoverer but before the recoverer started its job. This would create a race condition where the recoverer could conflict with in-flight transactions. This explains why Long.MIN_VALUE is returned in such case: to avoid that race condition.

As you noticed, there is that ill-effect that if there isn't any activity, the recoverer will postpone its job forever. An easy fix would be to return MonotonicClock.currentTimeMillis() instead of Long.MIN_VALUE and explicitly state that in getOldestInFlightTransactionTimestamp()'s javadoc.