/spring-neo4j-tx

Spring Neo4j XA Transaction Manager Test project

Primary LanguageJava

The problems with recovery
	- or -
why this thing doesn't work yet
-------------------------------

While the most common usage of transaction managers is the coordination of transactional resources with a protocol like 2PC,
another critical aspect of their usage comes from the ability to recover transactions that have failed due to system crash,
be that the transaction manager itself, another coordinating resource (like the OS) or one of the participating resources. For
the rest of this text, we will focus on XA compliant transactional systems (transaction manager + transactional resources) as
implemented by JTA.

While the JTA specification is very descriptive of almost all the aspects of distributed transactions, including the recovery process,
it leaves out a critical issue: how the XA resources that participated in a global transaction that was not decided upon because of
a crash will register with and inform the coordinating transaction manager of the xid of dangling local transactions, their vote etc.
Because of that, every implementation has its own way of dealing with the issue (for example, JOTM is simply happy to have instances
of XAResource registered and recovery being asked for explicitly while JBoss asks for serialized objects to be available somewhere 
for retrieval).

This leads to a situation where having true distributed transactions with JTA in a manner transparent over the actual txm implementation
is next to impossible. While the instantiation/retrieval of the TransactionManager object can be abstracted, the recovery process is
too tightly bound to the lifecycle of the applications that implement XAResource and of the way the txm implementation does recovery.

The question here is of course how does Spring fare on this aspect, since it provides a uniform access method irrespective of the actually
plugged in implementation of the txm. If it indeed succeeds in transparently handling recovery after crashes then there are no problems and
everything should work as expected simply by registering resources in the Transaction object. If, however, there is no such facility in place
yet, then recovery cannot work. Let's see an example

If I have configured jotm as the txm implementation, then I can simply retrieve the object (as a TransactionManager) from the
org.springframework.transaction.jta.JtaTransactionManager and have my XAResources registered there. The same goes for every other
part that is ready to participate in 2PC and the user is blissfully unaware of the actual workings. However, if the system crashes, then
jotm expects calls to Current.getTransactionRecovery().registerResourceManager() that will register all previously used resources
and then a call to Current.getTransactionRecovery().startResourceManagerRecovery() to actually recover all pending txs. JBoss does this
differently, as does Atomikos etc. Also, this call must be blocking because there is no requirement that a XA compatible storage system
must be able to serve requests while transactions are pending. Neo4j, for instance, will not serve requests while the store is not restored
to a clean state.

So, the question for the Spring hackers is, how do you provide recovery guarantees for all TransactionManager implementations you support
while hiding the details from the XA enabled components running in your framework? Or, to be more specific, if Neo4j uses the user configured,
external txm implementation via the Spring interfaces, what steps/API should it use to ensure that it will be recovered after a crash?

-- Edited by Chris Gioran