tarantool/tarantool-java

TarantoolClientImpl: refactor handling a connection state of the client

Opened this issue · 0 comments

Now we have an unclear flow of how the client changed its internal state and how inner worker threads influence on the client. It also leads to endless races under different circumstances which are hard to debug (see #142, #145 ).

In order to overcome that we need to reconsider how the client can manage its state in an easier way. @Totktonada has already started to figure it out in #145:

I thought how to better represent a connection state and want to share my current vision in sketchy form. Sorry, it is not much circumspect, more like a stream of consciousness. Consider this as 'a bag of ideas' and don't cycle if some part is unclear (it can be just wrong).

There are five actors: reader, writer, connecter, discovery and service task executor. The last one is to perform several actions: say, disconnect and then connect that is needed for discovery. (Also maybe 'connect' action is more like one of service task executor actions and connecter should not be considered as the separate actor?)

There are generations of a state. Each actor should only access state (say, socket) of its generation: maybe by copying it internally. It is needed because of two reasons. First, ensure that an actor have no ability to somehow change a state of other generation. Say, even if a writer of a previous generation stalls for a long time it cannot wake up and write to a socket from the next generation (that was created after reconnection). Second reason is to handle messages (FSM methods calls) from different actors and, say, don't reconnect twice when reader and writer both report an error.

FSM should accept all messages and queue them. Each message should contain a generation. Queueing allows us to consider actions as atomic and think about 'what to do it X succeeds/fails and then Y arrives'. We need to elaborate all possible combinations.

All state that is shared between actors should be accessed only using messages to FSM (maybe name it coordinator?). Maybe it should be hold in FSM / FSM context class?

Each state change should be atomic with corresponding actions. So we either have 'stopped' state or 'connected', not something in a middle. If a connection process fails we resides in stopped state (who should trigger the next attemp to connect in the case? A service task?).

As I see three states are enough: stopped, connected and closed (the terminal one). Don't sure we need a terminal state: we can have 'stopped, but trying to connect' and 'stopped and don't tried to connect'. Or maybe it does not have much sense if we'll create a new generation with its own state for each connect attempt? Don't sure.

We should decide who owns a state / context: say, who is reposible to close a socket. It seems that an actor cannot be responsible for that, so a coordinator should.

Also there is thought that splitting a state between generations looks as a half of work that is needed to handle several connections within one client. OTOH, don't sure we need it.

An actor should be allowed to subscribe to certain state changes. A reader and a writer should subscribe only to its generation, while discovery exists over generations, so maybe it should subscibe to messages dispite generations. So when a writer send 'write error' message to a coordinator, a coordinator will create a service task that will send a reader 'going to disconnect' message, then close a socket and change a state to 'stopped'. Kinda that. We need to elaborate to which outgoing messages actors should subscribe. Whether it should be only state changes? It seems that it is not so, because we want to notify a reader about an error in writer before close a socket, yep?

We also need to take care to propagation of an error message to a user, see also #30.