oracle-samples/clara-rules

Equals implementation on AlphaRootsWrapper gives false negatives, causing unnecessary cache misses

Closed this issue · 2 comments

In #242 we did work to improve performance of rulesets where there are productions that match on ancestors of fact types in the session. The original problem is discussed at #236. There is some design discussion later in that thread, but the parent post fully describes the original problem.

When we tried to use this performance improvement we found that we still had the performance problems characteristic of a lack of batching. The problem appears to be that I incorrectly assumed that we would only have one AlphaRootsWrapper object per distinct fact type, and thus used reference equality in its equals implementation. The source of restrictions on the distinct instances of the AlphaRootsWrapper is the fact that fact-type->roots [1] is memoized on the fact type, and the AlphaRootsWrapper objects are created inside this function. However, the ancestors-fn is called inside this function, not outside it. As a result, there can an AlphaRootsWrapper not just per primary type, but per primary type per descendant fact type. We thus can have AlphaRootsWrapper objects that have the same roots, but are not equal, and thus the facts for each are not combined in [2]. Since fact-type->roots is memoized on the type of the fact involved, for any given concrete fact type, where the concrete primary fact type is the return value of the :fact-type-fn on a fact in the session. We will thus have the caching that we did before #242, but facts that have different primary fact types but common ancestors will not be batched for activating the alpha roots corresponding to the common ancestors.

We originally wanted to limit the number of times that the fact type was used, since this can be a data structure of arbitrary complexity. However, object identity can't be used here as originally implemented for the reasons discussed above. In practice, we expect most fact types to be of minimal complexity and testing equality on them should be a very fast operation. My plan is to add the fact type to the AlphaRootsWrapper and use this in the equals implementation. If we run into cases where this sort of equality testing is expensive we can look into other solutions. One possibility would be to add node IDs to the AlphaNode, in which case we could test the equality of a set of numbers. However, we can make the equals check on the fact type have shortcuts that return true when the fact types are identical objects and false when their hashes are not equal, which allows users to have most of the previous performance benefits of checking by identity if user code is structured such that the fact type returned is identical. Of course, the equals implementation will need to fall back to equality checking if neither of these conditions is met.

@EthanEChristian is working on implementing this. Once we have a fix for this merged I plan to create a new RC release.

To be clear, I do not see any cases where this problem will result in incorrect behavior. This is a performance issue, not a functional issue.

  1. https://github.com/cerner/clara-rules/blob/0.13.0-RC6/src/main/clojure/clara/rules/compiler.clj#L1538
  2. https://github.com/cerner/clara-rules/blob/0.13.0-RC6/src/main/clojure/clara/rules/compiler.clj#L1556

Regarding testing, I see a couple of possibilities now (I didn't see good testing options when I did #236 but I don't think that original assessment was accurate). We can set up a test where there are multiple concrete fact types that have a common ancestor and there is rule that looks for that ancestors. The RHS of that rule can then perform actions that will cause test failure if it is fired too many times. Options there would include a counter or, if the rule uses an accumulator, validation that it is only fired with all the relevant facts and not a subset of them.

The pull request has been merged and I don't see anything left to do here. Closing.