NegationNode and NegationWithJoinFilterNode can perform extra downstream retractions

Question

NegationNode and NegationWithJoinFilterNode can perform extra downstream retractions

Closed this issue 8 years ago · 7 comments

Both the NegationNode and the NegationWithJoinFilterNode currently retract matching tokens upon a right activation. [1, 2]. However, the node will not have propagated tokens previously if other elements matched the negation condition for the join bindings. If there were elements previously, then the only thing we actually need to do is add the new matching elements to the memory. If the previous matching elements are retracted at some point we want these new matching elements to continue blocking the production in question (unless they are retracted as well obviously). The impact downstream of the negation node from having 1 to many matching elements should be identical. The problem with making extra retractions is that in some use cases equal tokens downstream may exist from other ancestor nodes, and thus the negation node may end up retracting tokens that were not propagated from it originally. I noticed this while investigating possible causes for the use-case that led me to log #229.

Answer 1 · 2016-10-18T10:48:19.000Z

I have submitted #232 to fix this for NegationNode. I left NegationWithJoinFilterNode alone for the moment since fixing this in a similar way would result in running the join filter function more, which would hurt performance. The impact is probably minimal but I'd like to look at it a bit more.

I added two tests:

test-extra-right-activations-with-disjunction-of-negations tests a scenario where, as described above, an extra retraction downstream removes tokens that logically originate from another node. This happens since we basically have two paths to the ProductionNode in question through two different NegationNodes, and since the NegationNode doesn't add new bindings the tokens that propagate through each path will be equal to each other.
test-negation-of-changing-result-from-accumulator-in-fire-rules tests a potential problem my first attempt at fixing this created. After the changes in #187 mem/get-elements returns a mutable list. My first attempt was something like

  (right-activate [node join-bindings elements memory transport listener]
    (let [previous-elements (mem/get-elements memory node join-bindings)]
      (l/right-activate! listener node elements)
      (mem/add-elements! memory node join-bindings elements)
      (when (empty? previous-elements)
        (retract-tokens transport memory listener children (mem/get-tokens memory node join-bindings)))))

The first time we right-activate this it works since the previous-elements is an empty vector [1]. Once we add an element, though, it becomes a mutable list, even if that element is retracted later. [2] Thus, if we later have an empty mutable list for the elements and right-activate the node, downstream retractions won't happen. The previous-elements will be a mutable list and won't be empty after the mem/add-elements! call. To avoid this problem we need to determine whether there are previous elements before adding anything.

(right-activate [node join-bindings elements memory transport listener]
    (let [previously-empty? (empty? (mem/get-elements memory node join-bindings))]
      (l/right-activate! listener node elements)
      (mem/add-elements! memory node join-bindings elements)
      (when previously-empty?
        (retract-tokens transport memory listener children (mem/get-tokens memory node join-bindings)))))

Overall the consequences of returning mutable lists from the memory may merit some thought; @mrrodriguez had some concerns yesterday that we he wanted to look into. That is out of scope for this issue though. Doing the empty? test upfront solves the immediate problem for this fix.

Answer 2 · 2016-10-18T12:32:47.000Z

The first time we right-activate this it works since the previous-elements is an empty vector [1]. Once we add an element, though, it becomes a mutable list, even if that element is retracted later. [2] Thus, if we later have an empty mutable list for the elements and right-activate the node, downstream retractions won't happen. The previous-elements will be a mutable list and won't be empty after the mem/add-elements! call. To avoid this problem we need to determine whether there are previous elements before adding anything.

I'm going to clarify on this some here. In #184 an optimization was made where TransientLocalMemory will convert its internal persistent collections into mutable collections when facts are removed from those particular places in memory. The idea here is that if a position in memory changes once, it is expensive to perform the removal and the update on a persistent collection. So it is done mutable and left mutable for the rest of the duration of the TransientLocalMemory's lifecycle in case it is changed again. If it is changed again, it will be much faster once already mutable.

This optimization seemed to give benefits in heavy retraction flows which were naturally triggered by some less ideal situations with truth maintenance.

An unfortunate result of this optimization is that mem/get-elements, mem/get-tokens, etc may return mutable collections when called on a TransientLocalMemory, which is what the majority of the functions in clara.rules.engine access.

Since these may be mutable, you cannot use them as if they were immutable.

From above:

  (right-activate [node join-bindings elements memory transport listener]
    (let [previous-elements (mem/get-elements memory node join-bindings)]
      (l/right-activate! listener node elements)
      (mem/add-elements! memory node join-bindings elements)
      (when (empty? previous-elements)
        (retract-tokens transport memory listener children (mem/get-tokens memory node join-bindings)))))

Is making changes to the same mutable collection as it had previously bound as previous-elements. So this is just incorrect logic from a mutable collection standpoint.

Obviously though, this is subtle and sneaky and needs to be something we consider when working within the engine dealing with TransientLocalMemory. Basically we'd always need to assume the collections returned from mem/get-elements and similar are mutable and we can only "trust" their value within a controlled scope prior to any mem/add-elements! etc being subsequently called on them.

For the most part this all works out ok.

I actually did have some concern about this causing further issues though and started to investigate. The issue I came up that was a bit of a red-flag for me is that in a large majority of the beta nodes in the rulebase wrap these potentially-mutable collections in lazy seqs via for.
An example is https://github.com/rbrush/clara-rules/blob/0.13.0-RC2/src/main/clojure/clara/rules/engine.cljc#L417-L421

Lazy evaluation and in-place mutation do not really work well together. However, in some controlled circumstances it can be ok. Again, I worry of the subtleties here. So far I've convinced myself that our consumption of these lazy sequences would always be from the underlying collection prior to it having any mutation done to it. If it were mutated, we'd never realize the lazy sequence anyways and instead instantiate a new one on the new state of the mutable collection.

However, fully proving this is the case and ensuring it stays the case can be tricky.

Do we have alternatives? The obvious answer is to just convert mutables back to immutable, persistent collections when mem/get-elements or mem/get-tokens is called on TransientLocalMemory. However, this requires a full and eager copy of the collection. This is very similar behavior to how mem/remove-first-of-each performs removals from persistent collections and we know that is slow from empirical evidence.

If this same memory location subsequently has items removed from it, it would again flip back to mutable. Then if it was accessed again, it'd be back to immutable. This would most likely be unacceptable to performance if this were to be something that rapidly happens in any sort of frequent case. It could be worth experimenting with though.

There is a chance that the changes to a position in memory happen heavily, but in isolation during the propagation across nodes. Perhaps mem/get-elements isn't called in some sort of 1:1 way with these element/token removals.

In summary, I don't see any immediate issues and so far have convinced myself that everything is ok. However, I do think this is a subtle point and something to be very careful with for the time being when accessing the transient version of memory within the engine.

Answer 3 · 2016-10-18T12:35:07.000Z

I left NegationWithJoinFilterNode alone for the moment since fixing this in a similar way would result in running the join filter function more, which would hurt performance. The impact is probably minimal but I'd like to look at it a bit more.

I'll just note here that I'd rather see a performance effect than have logic in a beta node that is incorrect. It is easy to make things highly performant when they don't do what they are supposed to. :P

However, it could still be done in another issue/PR if you think that separation of concerns is appropriate.

Answer 4 · 2016-10-21T10:37:11.000Z

After some thought I agree with @mrrodriguez that we should just use the simple fix for NegationWithJoinFilterNode for now and improve the performance later if necessary. The main route I see to making it more performant would be to add a "negation memory" where we'd store a map of tokens to whether they were propagated and how many of those tokens there should be for the node. Upon a right-retract we'd still need to evaluate the remaining elements if a matching element were retracted for a token, but on a right-activate we'd be able to avoid evaluating previous elements. This seems like a lot of work for a problem we don't know we have though.

I've submitted a new pull request at #233 that fixes both NegationNode and NegationWithJoinFilterNode.

Answer 5 · 2016-10-24T12:53:24.000Z

I've updated #233 in response to @mrrodriguez comment.

Answer 6 · 2016-10-26T02:09:31.000Z

+1
The changes look good to me.

Answer 7 · 2016-11-10T17:39:36.000Z

The PR has been merged and I don't see anything left to do on this issue. Closing.