oracle-samples/clara-rules

Requesting clarification re: multiple equal facts (I found this counterintuitive)

blak3mill3r opened this issue · 4 comments

I am just beginning to try to familiarize myself with this abstraction. I read all of the documentation, and I've just discovered that inserting a fact is not idempotent. My intuition was that it would be.

After re-reading the documentation and some of the issues history, I didn't find any mention of this specifically. #84 is related, but my question is simpler still. I'd appreciate some clarification on the reasoning behind storing duplicate facts (and allowing them transitively to produce multiple identical inferred facts).

I am not sure what it might mean to store and reason about two or more exactly identical "facts". They are all the same fact, no? If they are not the same fact, in spite of being equal, then how can I model that two equal facts do not carry any more information than just one of them?

My expectation was that I could model, for example, two separate observations of the same information as two records which differ only in an observation-timestamp field, and that if several different combinations of rules happen to imply exactly the same fact, that this would be equivalent in its downstream implications to any single piece of evidence that would imply the truth of that one fact. Instead it seems that there is a difference between the same fact being inserted once, twice, or n times. Why is that?

(ns confused
  (:require [clara.rules :refer :all]))

(defrecord BottlesOfBeer [count time])

(defquery get-bottle-count [:?time]
  [?count <- BottlesOfBeer (= ?time time)])

(def ts (.getTime #inst "2020-01-01T00:00:00.000-00:00"))

(-> (mk-session)
    (insert (->BottlesOfBeer 99 ts))
    (insert (->BottlesOfBeer 99 ts))
    (fire-rules)
    (query get-bottle-count :?time ts))
;; =>
;; ({:?count #confused.BottlesOfBeer{:count 99, :time 1577836800000}, :?time 1577836800000},
;;  {:?count #confused.BottlesOfBeer{:count 99, :time 1577836800000}, :?time 1577836800000})


(let [[f1 f2] (-> (mk-session)
                  (insert (->BottlesOfBeer 99 ts))
                  (insert (->BottlesOfBeer 99 ts))
                  (fire-rules)
                  (query get-bottle-count :?time ts))]
  (= f1 f2))
;; => true

Should I query the session first and only insert if the fact is not already present? Or, do I need to use an accumulator to distinct them before inferring anything further? The example above is simpler than the code I was actually working with (I was initially confused by the presence of duplicate inferred facts) but I would like first to understand what it means to have f1 and f2 returned by the query above.

The short answer is that yes, Clara will allow multiple equal facts to be inserted. If you want to reason over at most one fact you can do that with a distinct accumulator in your rule and query conditions. Do keep in mind that distinct is an accumulator subject to the rules on bindings described at http://www.clara-rules.org/docs/accumulators/ ; that is, it will accumulate per distinct set of bindings used.

In terms of the meaning of multiple facts, it really depends on the use case. For example, two instances of a MedicationAdministered fact could mean that the medicine was administered twice. This was a natural way to model some of the motivating use-cases behind Clara's creation. Duplicate removal would generally involve hashcode calculation as well, and in some cases (that we had) this can actually be quite expensive as well.

That said, I do think that an option to not allow duplicate facts would be quite useful, and I think the reason it doesn't currently exist is just due to nobody working on it yet (bearing in mind that I'm only speaking for myself, other maintainers may have different views).

I agree with @WilliamParker answer here. Thanks for explaining it. I think we’ve explained this several times before. It’d be good Clara FAQ page perhaps to have answers like this.

Thank you @WilliamParker for the explanation.

I thought I would update this with a little more of my own experience, in case it helps someone. I wound up choosing to do this in the application layer, modeling it separately from the clara session, rather than trying to make an "idempotent" option for a clara session. I'm not convinced that such an option isn't worthwhile, but I did decide it wasn't what I needed in this case.

What I needed was just that certain updates (from user actions for example) would be idempotent as far as the clara session was concerned. My rules did not themselves create duplicate downstream facts that needed to be treated as one fact, but I had "facts from the outside" that I wanted to be idempotent. I chose to model that separately: I keep a "user model" value, have user actions update that instead of updating the session directly, and have a fn of user-model -> facts. For every change to the user model, I diff the facts from that fn, and insert/retract them. This seems sensible to me, particularly given that making clara keep track of this would have performance implications, and the user model is tiny compared to the set of facts.

I want to mention that once I did that, I have found the rest of my clara.rules experience very intuitive and easy to understand. I think this is a powerful abstraction which I'm delighted to have been introduced to by this library. I went through several iterations of design trying to discover the best abstraction for the logic of this X window-manager thing I'm working on. Having settled on clara, I'm very pleased with the way it's working, especially how easy it is to extend with new rules.

Thank you, and props to everyone who has worked on this. Good stuff!