oracle-samples/clara-rules

Reduce transitions between transient and persistent memory and facilitate performance enhancements to fire-rules

Closed this issue · 4 comments

Currently when a user wishes to retract some facts, insert other facts, and then fire the rules the user must do something like

(-> session
    (retract-facts facts1)
    (insert-facts facts2)
    fire-rules)

As a result there are 3 transitions between transient and persistent memory. #184 improved performance for our use cases with large numbers of facts and Rete operations overall, but it did increase the cost of these transitions. On average, I've seen us spend 1-3% of runtime in these transitions, which while not huge is not insignificant.

We can cut this down to a single transition by caching the insertions and retractions and only actually executing them when fire-rules is called. By using a Clojure persistent data structure for this cache we can efficiently maintain the existing semantics around insert and retract returning new sessions. This also has the benefit of giving fire-rules visibility to and control of the insert and retract operations, which facilitates adding options to control the performance of the Rete operations. For example, it would facilitate the enhancements discussed at #249. There was some discussion on this in relation to that enhancement at WilliamParker@af97744#commitcomment-20782287 The API I'm envisioning there would then look something like

(fire-rules session {:cancelling true})

on top of the changes proposed in this issue.

The main downsides I see are

  • We're changing behavior in that querying on a session before firing the rules will consistently not work. However, there was previously no guarantee on this anyway and we have broken scenarios for such query usage with extracted negations as discussed at #149. In the long run it would likely be better to consistently return nothing from queries than to inconsistently successfully query.
  • I can envision cases where one could benefit from propagating to the LHS before firing the rules, say if a single session with some facts served as a "parent" to many other sessions that would have additional facts and then be fired. This seems like a fairly small subset of cases though and could be addressed by adding an option to flush the cache without firing the rules if necessary, or even allowing an option to only fire through a particular activation group, which would be a much more general solution for such use cases.
  • Arguably the best solution for this would be to have the fire-rules function take facts to insert and retract as an argument. That is, something like (fire-rules session {:insertions facts1 :retractions fact2}) that would line up more closely with the actual behavior if we make this change. However, at this point I think the passivity concerns around removing insert and retract would be significant, and having two ways to insert and retract facts would be pretty confusing to users in my opinion.

On average, I've seen us spend 1-3% of runtime in these transitions, which while not huge is not insignificant.

I'll say that this is understating the performance impact in certain types of workflows. For one, if durability is being used to save large working memory state to recover later and insert/retract small sets of new changes, the transition from persistent to transient can become a much more significant percentage of runtime performance.

I believe the 1-3% number you are discussing here is in a batch workflow where the number of inserts, retracts, and rules to fire is large and somewhat starts to dwarf the time it takes to transition the working memory from persistent to transient and back.

I concur that workflows that retract and insert facts from sessions with lots of facts already in them would likely be harder hit. Unfortunately I don't have a good way to gather numbers on those particular use-cases though at the moment.

I have created a PR for this at #269

@rbrush do you have any thoughts on this type of change?

The PR has been merged. I don't see anything else to do here. Note that this unblocks #249 and I will submit a new PR for that with the changes from #263 rebased on these changes.

Closing the issue.