oracle-samples/clara-rules

Serialization should not duplicate Tokens and Elements

Closed this issue · 3 comments

It seems during serialization of the session in durability duplicates Elements and Tokens due to modifications made to the Element/Token while indexing facts contained within.[1] The updates to these objects means that they are no longer identical and when serializing them[2], specifically [3]
the IdentityHashMap used sees these Elements/Tokens as unique.

A "simple" test:

(require '[clara.rules :as r])
(require '[clara.rules.durability.fressian :as df])
(require '[clara.rules.durability :as d])

(defrecord SimpleRecord [x y])
(defrecord Output [value])

(defrecord LocalMemorySerializer [holder]
  d/IWorkingMemorySerializer
  (serialize-facts [_ fact-seq]
    (reset! holder fact-seq))
  (deserialize-facts [_]
    @holder))

(r/defrule sum-rule
  [SimpleRecord (= ?x x) (= ?y y)]
  =>
  (r/insert! (->Output (+ ?x ?y))))

(r/defrule sub-rule
  [SimpleRecord (= ?x x) (= ?y y)]
  =>
  (r/insert! (->Output (- ?x ?y))))

(r/defrule multi-rule
  [SimpleRecord (= ?x x) (= ?y y)]
  =>
  (r/insert! (->Output (* ?x ?y))))

(r/defquery get-outcomes
  []
  [Output (= ?val value)])

(def session 
  (r/mk-session [sum-rule
                 sub-rule
                 multi-rule
                 get-outcomes]))

(defn serde-session
  [fired]
  (let [session-baos (java.io.ByteArrayOutputStream.)
        session-serializer (df/create-session-serializer session-baos)
        rulebase-baos (java.io.ByteArrayOutputStream.)
        rulebase-serializer (df/create-session-serializer rulebase-baos)

        holder (atom [])
        mem-serializer (->LocalMemorySerializer holder)

        _ (d/serialize-rulebase fired rulebase-serializer)
        
        _ (d/serialize-session-state fired
                                     session-serializer
                                     mem-serializer)

        rulebase-data (.toByteArray rulebase-baos)
        session-data (.toByteArray session-baos)

        rulebase-bais (java.io.ByteArrayInputStream. rulebase-data)
        session-bais (java.io.ByteArrayInputStream. session-data)
        rulebase-serializer (df/create-session-serializer rulebase-bais)
        session-serializer (df/create-session-serializer session-bais)

        restored-rulebase (d/deserialize-rulebase rulebase-serializer)
        restored (d/deserialize-session-state session-serializer
                                              mem-serializer
                                              {:base-rulebase restored-rulebase})]
  restored))

(def rec-holder (atom []))

(alter-var-root 
  (var clara.rules.durability/clj-record-holder-add-fact-idx!)
  (fn [f] #(do (swap! rec-holder conj %)
             (f %))))

(as-> session ses
              (r/insert ses (->SimpleRecord 1 2))
              (r/fire-rules ses)
              (let [out (r/query ses get-outcomes)] (println out) ses)
              
              (serde-session ses)
              (let [out (r/query ses get-outcomes)] (println out) ses))

(def grouped-rec-calls (group-by identity @rec-holder))

(def dupes (filter #(< 1 (count (val %))) grouped-rec-calls))

(-> dupes first val count) 

4 "unique" Tokens with current implementation

With the rules/insert above I would expect exactly one Token to be propagated.

I am currently working on a fix to eliminate these duplicates.

@mrrodriguez @WilliamParker

[1] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability.clj#L228-L250)
[2] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability/fressian.clj#L348-L375)
[3] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability/fressian.clj#L355)

@EthanEChristian regarding your check above, I don't think that the function "identity" is actually what you wanted. It doesn't actually have anything to with object identity, but rather is just a function that returns its argument that is useful as a argument to functions that takes functions as arguments. [1] To recap, the core issue here is that the duplication of elements and tokens unnecessarily increases the size of the serialized sessions.

This issue has been resolved at #248

I don't see anything else to address here. @EthanEChristian do you agree that this issue can be closed?

  1. https://github.com/clojure/clojure/blob/clojure-1.8.0/src/clj/clojure/core.clj#L1402

@WilliamParker, I used identity because I wanted to group-by equivalent facts. So it gave me buckets of facts the are equivalent via =, this was the simplest way i could think to group facts that contain the same item. This method doesn't take into account nodes that have actually unique copies of the Token but it can give me a count of how many times the Serde thought it saw a unique Token and from there we can see that the number that the Serde saw was higher than what i would expect.

As the fix has been merged, I think it should be fine to close this one.

Closing.