vanrein/perpetuum

Find a model for Backend Driving

vanrein opened this issue · 9 comments

We have considered a few models up to here:

Request/Response. This is where Rick's idea for Perpetuum came from, as a model of asynchronous processing, cutting long-lasting (remote) requests into a request/response pair and treating each of these atomically on a transition. Requests would be outgoing actions, responses would be incoming events (and probably extend the logic of a surrounding event loop).

Longer Activities in Places. This addition by Adriaan meant that we did not have to explicitly write down requests and responses. We could make the following transitions dependent on the exit code or another form of return value, perhaps the errno variable, to accommodate more flow control.

Multiple outputs. One longer-lasting process could bring out more than one output. A good application would be an LDAP query that responds with any number of SearchResult messages before it sends SearchDone. It is attractive to be able to process the individual SearchResult messages separately, so their contents can more easily be dissiminated over the rest of the graph. This was not possible with a transition looping back to the place holding the long-lasting activity, but we needed that to retain the Petri net semantics that supports analysis. So we had to explicitly start the long-lasting activity before the place.

We are now dealing with variables that are carried around, namely the SearchResult information that we selected. This is starting to look like colours, and analysis of guarding constraints. We would like to avoid that, at least initially, until we have a better idea of tools to use, and their expressive power. So for now we've considered guards, if they are used at all, to be properly modelled as non-deterministich choices (that Petri nets already have).

We could generate locks as part of the set of Petri nets that is fed into the code generator, as well as into analysis tools. This helps to avoid internal locking inside the backend, which would be difficult to incorporate into Petri net analysis.

Ideally, we would have something like a functional program, where input (from events/responses or activity outcomes or whatever else we come up with) is received but never overwritten. Reception would be like λ-calculus, namely binding (possibly with pattern matching so we can also match against constants such as errno values).

Variables could be scoped until control merges with another branch that does not define the variable. Different branches may merge the same variable, as long as it has the same type. Typing and sizing info can be obtained from the backend's metadata so we don't need to supply anything but a set of Petri nets and a set of backends to the code generator. We probably shouldn't use variables with different types but the same name in one Petri net. Variables may be considered local to a Petri net, I suppose.

We could use CSP-style communication, so c!x to write x to (backend) channel c and c?y to read variable y from channel c; we might use patterns in place of the receiving y; a variable is just the most generic reception possible. We could allow for indexed variables like y.uid if we define such structures in the backend, but pattern matching is still valuable to avoid having to write down guards. Pre-bound variables would simply function as literal values to match. In spite of all this richness, the c!x and c?y variations could still be considered atomic, and might be combined into one large, parallel composition of individual communication activities. It should be quite doable to process these in a transactional fashion, through the previously defined individual return values for Success, Failure and Defer-by-N-seconds. The Success would count as having prepared to commit, and an explicit commit() would need to be sent to the backend to complete the atomicity of the parallel composition.

Backends could run before the first c!x and may already be feeding c?y. They are simply independent modules that may create threads and processes as they seem fit, and there is no strict paradigm of request/response but merely communication with that process. Anything that constitutes synchronisation is captured in the c!x and c?y semantics, which means that the backend fills in the non-determinism that we are assuming from Petri net transitions. Of course this means that the backend has some responsibility in keeping the whole thing going. Among this is that it should send hints to the generated code that it might want to try (again) on a certain transition. This may in fact be modelled as additional incoming transitions, and these may in fact need to be generated as an add-on net for use during analysis and code generation, but only to cover the most complex backends.

Backends may also create channels on the fly, as part of any exchange. Even during a c!x, they might create a secondary channel c.resp over which response is to be retrieved, through c.resp?z. Or multiple, such as a c.errno channel. Such extra channels are scoped and allocated like variables that are read, but they are part of the specification of backend channel operations. The nice thing about them is that they capture a more specific concept, namely that of request identities that would otherwise have to be processed inside the Petri net, but that now become part of the already existing (well, imagined at least...) logic. This also saves us from adding guards to the logic. And it means that the backend has some sane scoping and cleanup to apply to request queues and whatever else it may have to do. And when channels eventually respond, we can rest assured of their function without any additional Petri net analysis. Otherwise, we may have to generate a Petri net model of their behaviour by pairing additional Petri nets to a design's transitions.

Storage may be structured as it is done in Nginx, using a region-based memory pool. Such a pool could be started by the backend at the moment it is first contacted, and it may be cleaned up when the Petri net logic reports that it is getting rid of certain variables. This means we can implement the idea we had about λ-calculus and scoping!

This currently seems to be the best of all worlds:

  • Semantics that are easily understood
  • Reasonably straightforward implementation in backends
  • No need for additional variable locking, even if generated it is difficult
  • No internal locks to be held by backends, AFAIK
  • Events and actions merge into the same notation, not a forced Event [ "/" Action ]
  • There can be many sends and many receives on one transition
  • Ability to match against constants
  • Ability to handle (local sub) colours
  • Retained simplicity of analysis
  • Ability to handle multiple results
  • Efficient memory management model

Interestingly, this brings us closer to π-calculus, whose novel primitives are:

  • Creating a new name (which is a new communications channel)
  • Processes that easily instantiate; this is a like looping Petri net

The practice in π-calculus is of course that no unbounded process creation is to be expected; a new name will be constructed and used for communication to keep it in bounds. In terms of Petri nets, we could prepare a multitude of x.resp channels and supply it as c.resp during c?y to mirror a Petri nets' c!x event:

pi-calculus-backend-example

The same idea, in another notation, does not emphesise the Petri net specifics as much; it is simpler and I also like the notation more:

pi-calculus-backend-nicer-notation

An excellent discussion of structured types and the semantics of pattern matching and efficient compilation of pattern matching can be found in the excellent book on Implementation of Functional Programming Languages by Simon Peyton-Jones.

Other than provindng an elegant encoding mechanism, this also demonstrates how completeness can be verified. This helps to avoid that the backend may wish to communicate something that the front-end isn't picking up. So if we create typing of similar structure (basically sum and product types, so nothing fancy) we should be able to report if the user forgets a branch and thereby risks deadlocks that the Petri net analysis might have missed.

There are good reasons to describe backends as Petri nets themselves, though, and then this line of analysis would end up with the general Petri net analysis. Still, the completeness of structure matching at the various splits in a Petri net's flow could be valuable, even just as a hint that these backend models would add something useful (in case their inclusion during analysis is made optional).

In terms of resource management: One might consider a channel as a unit of management, and of reference counting. Data can be passed at any time and cleanup can be arranged locally, but communication is more difficult.

Imagine use counts on channels, and set that during ν c.new to the number of parties that observe it from then on. When an involved Petri net ends its use of this channel, it lowers the use count, and the last one removes the region via a backend call. This replaces the explicit clean?c.resp labelled transition, as well as the gone process in the Petri nets shown above.

The driver will usually have a few channels statically allocated, to bootstrap communication. In the examples given before, this is the c channel.

This is all getting awfully dynamic. In static cases, we might optimise to avoid this, of course. And it is certainly possible to only implement those static cases for now, and keep in mind that we have room to grow. That will certainly give a very efficient implementation. Supporting more dynamicity to be able to handle multiple SearchResult messages does not seem like a waste of resources, though.

The one limitation that really emphasises the need for more powerful analysis would be if we tried to combine the various forks that we are now describing -- so, what c!x and x?y communicate between the various SearchResult messages of the same query, for instance. That would require much more insight in the restrictions on the math used. With "just" the things above we should be able to get enough out of Petri net analysis because we are merely choosing implementations within the model's non-determinism. Even the forking of instances is not a problem.

Notes about processing events that are originated outside the Petri net:

  • these may trigger transitions, and pass data in/out
  • will events be queued (sounds awkward) or dropped (sounds lossy) when transition cannot fire?
  • note: it is always possible to have a transition somewhere, ready to lap up an event
  • it is quite doable to make such transitions active immediately when the event cause initiates
  • this means that the model of discarding events on mismatch is not as lossy as it seemed
  • likewise, it is always possible to ditch a token for an undesired event -- they seem equivalent

So, an event could easily cause a transition. Their implementation would make the queued model simple too:

  1. Transitions depending on an event have an extra +1 to their countdown initialisation
  2. An event lowers countdown by one and causes try_firing() on the transition
  3. Without queueing semantics, the event must increase countdown after try_firing()

It would be easy to offer both semantixes... perhaps there is an elegant notational reason?

Our first stab at event semantics will be very simple:

  • Follow PNML logic by labelling transitions with a C-style identifier
  • Prefix with trans_action_ as a user function call to be made
  • Declare the user function, call from trans_t, assume it will be linked
  • Events will say "not yet" through trans_retcode_t value TRANS_FAILURE
  • Timeout alternatives to events return trans_retcode_t value TRANS_DELAY(n)
  • When events are routed to the Petri net instance, they try to fire transitions
  • Multilple transitions may be tried one by one, requiring countdown==0
  • Event-triggered transitions may als fail to fire; then try the next option

This simple framework is workable and is likely to teach us how to proceed.

This has been established starting commit 6ac7cd8.

The Erlang code confirmed this model; it is pretty much the kind of feedback that Erlang functions expect to receive. Though in a slightly different form, it is conceptually the same.