oracle-samples/clara-rules

Should LocalSession implement Clojure protocols?

Closed this issue · 6 comments

Just wondering if there's a reason why LocalSession doesn't or shouldn't implement Clojure protocols like IAssociatve.-assoc, IMap.-dissoc, ILookup, ISeqable, and others.

Was thinking it might be nice to have and would allow us to do things like the following:

  (update my-session :rulebase conj (dsl/parse-query params lhs))
  
  (get my-session :listener)

  (keys my-session) 
  ; => (:rulebase :memory :transport :listener :get-alphas-fn :pending-operations)

As a general rule I'd prefer to approach these things by thinking of what kind of functionality should be provided and work backwards from there by considering what the best API to provide that functionality is, rather than implementing a particular type of API, in this case various Clojure collection abstractions, as a goal in itself. Regarding these collections APIs, it isn't clear to me what meaning many of them would have in the context of a rules session. For example, I don't see a clear logical meaning of coercing a rules session to a sequence.

Providing data-oriented APIs is a good thing in my opinion. For example, we've made a point of having a representation of rules as data that is independent of the defrule and defquery macros that can be targeted to create rules/queries directly, inspected by a user or tooling, etc. Another example is the information on the session state returned by the session inspection functionality that is a normal Clojure data structure and that can thus be manipulated by user tooling, in a REPL, etc. To that end, suggestions on how we can make further improvements in this regard are welcome.

Regarding your specific use-cases:

  • (update my-session :rulebase conj (dsl/parse-query params lhs)):
    You're envisioning adding a rule or query to an existing session, correct? This currently isn't supported in Clara. Firstly, this type of functionality has an unavoidable downside in terms of space consumption. Suppose you have a rules session with the following rule:
(defrule cold-and-windy)
    [Temperature (= ?t temperature ) (< temperature 0)]
    [WindSpeed (= ?w windspeed) (> windspeed 30)]
    =>
    (insert! (->ColdAndWindy ?t ?w)))

into which you insert a Temperature fact with a temperature value of 100. Since this Temperature fact does not satisfy the condition on Temperature facts in the rule, namely that the temperature value be less than zero, Clara won't hold a reference to the Temperature fact. It will be garbage-collectable if nothing else holds a reference to it and it won't be serialized with the session. This is a very valuable property for many of Cerner's use-cases. If, on the other hand, one allows rules and queries to be added later, then the session has to retain references to all facts since a rule or query matching them could be added later. If there is a desire to make this user-configurable that could be discussed, but it would be a significant amount of work and complication in, among other places, the Clara compiler.

  • (keys my-session)
    ; => (:rulebase :memory :transport :listener :get-alphas-fn :pending-operations)

    This is already provided by the components function on the rules session, with the exception that the :pending-operations aren't included. These values are something of implementation details to different degrees. It seems to me that making a session be a map of them wouldn't really make logical sense from a user-facing point of view since I wouldn't expect most users to care about their values. Perhaps the :pending-operations should be added to the components, though. To be honest, it isn't something I'd thought about, but at first glance it does seem odd that they aren't in there and all of the other fields you mention are.

If, on the other hand, one allows rules and queries to be added later, then the session has to retain references to all facts since a rule or query matching them could be added later.

This isn't not the only possibility though. You could make the semantics be that the addition or removal of a rule has different affects on the working memory state.

  1. The working memory could be completely wiped out as a "blank slate"
  2. The new rules could have the semantics that they are only in consideration of the point in time after they were added and won't include any facts that were previously possibly inserted, but were not involved in the matches of any other rules.

I've seen rules engines do (2) before to at least some extent, although I'm not a big fan of it.

(1) however, makes sense to me and I could see an API where that would be useful. I may not hide it behind the facade of the clj core update or assoc functions though since I think that would be fairly non-obvious what it is meant to do.

Just a side thought too: Also, I'd have to think about how hard it would be to actually insert a new rule into the pre-compiled rete graph structure internally. I'm thinking it probably wouldn't be too easy. Probably would just have to fully rebuild by being sure to "remember" all the rule structs that it was made of prior to the new additions. However, the caller could remember the the rules externally to and just rebuild it with the new rules added onto that if that was the only thing the feature was helping with. It may be nice to be able to add onto it without a full recompile for perf gains, but like I said, that could be complicated.

You're envisioning adding a rule or query to an existing session, correct?

Yes. 😈

Wow. Thank you both for your incredibly thoughtful responses.

My interest and use case at the moment is actually doing this with a query, or LHS syntax. Not sure if that lowers the complexity whatsoever compared to adding rules. I'm working in about 50% cljs 50% clj, and I'm not fully aware of the perf differences or implementation complexities between the two when it comes to (re)defining a session's queries based upon a previous session.

Any idea whether 100 queries that aren't being used affect performance significantly? Perhaps I could use agenda grouping to turn them off if that ever became a problem? I'm working with animation on the client side, so performance is a concern.

For REPLing and general use I'd much like (query session params lhs) but I'm guessing there are at least several good reasons why we can't do that.

I'm willing to try to implement anything you think might be worth pursuing here and don't mind having PRs rejected.

Thanks for all your work on this library. I use it every day and I love it.

The algorithm behind the Clojure and ClojureScript version is mostly the same. The biggest known difference in performance that comes to my mind is that Clara uses a data structure called the "memory" to store various intermediate states of the rules session. In the Clojure version, this uses a lot of mutable Java types to improve performance, although in a way that is invisible to the user (i.e. from the user's point of view this doesn't impact the immutability of sessions). The ClojureScript version uses normal ClojureScript data structures without any similar interop with mutable JavaScript types. That said, I haven't used the ClojureScript version for any real-world use cases so I don't necessarily have insight into what hidden problems may exist.

Regarding adding queries: To back up a bit, Clara builds a rule network where facts enter in an AlphaNode and propagate through a directed graph of different processing nodes. The leaf nodes of this graph of processing nodes are all QueryNodes or ProductionNodes. A ProductionNode corresponds to a rule and can insert new facts that are then provided to the relevant AlphaNode(s). A QueryNode just stores its results for querying later. As such, the complexity of adding new rules or queries to the session would be pretty similar. One would also run into the same problem for both adding rules and queries in that you'd inherently have to hold onto all previous facts if your objective was to add a query to the session and return any matching fact that was previously inserted (as Mike alludes to above, if your goal were other semantics this would be potentially avoidable). Either way this would be quite complex to implement. Also, adding salience on queries won't actually help since it control the movement of facts from ProductionNodes to AlphaNodes, although adding salience on rules feeding into queries could help.

In general, I've found that most Clara performance problems come from particularly bad interactions of rules/queries in the compiled rules network, not simply large number of rules, queries, or facts. There are exceptions to this of course where the sheer number of inputs causes problems. Truth maintenance normally performs well, but it has some overhead, and reducing its performance impacts has been the focus of many of the optimizations of Clara. In normal cases, even sessions with very large numbers of rules and facts perform quite well. Regarding the number of queries in a session, if there are no facts in the session that match the query's fact type no work will be performed. This documentation on performance could be useful reading, if targeted at a slightly different problem.

My usual approach to performance problems is to use the simple approach first and then see what if any performance problems actually exist. My experience is that what turns out to be a problem in reality can be quite surprising. Obviously I don't have really detailed knowledge of your use-case, but that approach would be my usual suggestion. If problems do come up, any replicating examples you could provide would be very helpful. My best guess would be that tuning the memory implementation for ClojureScript would give the most benefit relative to development time and complexity, but that could be wrong. There are ways I can think of that queries could be optimized that rules can't that we haven't done yet, but that would be more complex to do and probably isn't worth it until we have concrete evidence of a need for it.

Thank you. Extremely helpful. Hope you don't mind that I close the issue.

@alex-dixon I agree with closing the issue; the only reason I hadn't closed it yet was that I was wondering if your questions were answered. You're welcome.