Serialization should not duplicate Tokens and Elements
Closed this issue · 3 comments
It seems during serialization of the session in durability duplicates Elements and Tokens due to modifications made to the Element/Token while indexing facts contained within.[1] The updates to these objects means that they are no longer identical and when serializing them[2], specifically [3]
the IdentityHashMap used sees these Elements/Tokens as unique.
A "simple" test:
(require '[clara.rules :as r])
(require '[clara.rules.durability.fressian :as df])
(require '[clara.rules.durability :as d])
(defrecord SimpleRecord [x y])
(defrecord Output [value])
(defrecord LocalMemorySerializer [holder]
d/IWorkingMemorySerializer
(serialize-facts [_ fact-seq]
(reset! holder fact-seq))
(deserialize-facts [_]
@holder))
(r/defrule sum-rule
[SimpleRecord (= ?x x) (= ?y y)]
=>
(r/insert! (->Output (+ ?x ?y))))
(r/defrule sub-rule
[SimpleRecord (= ?x x) (= ?y y)]
=>
(r/insert! (->Output (- ?x ?y))))
(r/defrule multi-rule
[SimpleRecord (= ?x x) (= ?y y)]
=>
(r/insert! (->Output (* ?x ?y))))
(r/defquery get-outcomes
[]
[Output (= ?val value)])
(def session
(r/mk-session [sum-rule
sub-rule
multi-rule
get-outcomes]))
(defn serde-session
[fired]
(let [session-baos (java.io.ByteArrayOutputStream.)
session-serializer (df/create-session-serializer session-baos)
rulebase-baos (java.io.ByteArrayOutputStream.)
rulebase-serializer (df/create-session-serializer rulebase-baos)
holder (atom [])
mem-serializer (->LocalMemorySerializer holder)
_ (d/serialize-rulebase fired rulebase-serializer)
_ (d/serialize-session-state fired
session-serializer
mem-serializer)
rulebase-data (.toByteArray rulebase-baos)
session-data (.toByteArray session-baos)
rulebase-bais (java.io.ByteArrayInputStream. rulebase-data)
session-bais (java.io.ByteArrayInputStream. session-data)
rulebase-serializer (df/create-session-serializer rulebase-bais)
session-serializer (df/create-session-serializer session-bais)
restored-rulebase (d/deserialize-rulebase rulebase-serializer)
restored (d/deserialize-session-state session-serializer
mem-serializer
{:base-rulebase restored-rulebase})]
restored))
(def rec-holder (atom []))
(alter-var-root
(var clara.rules.durability/clj-record-holder-add-fact-idx!)
(fn [f] #(do (swap! rec-holder conj %)
(f %))))
(as-> session ses
(r/insert ses (->SimpleRecord 1 2))
(r/fire-rules ses)
(let [out (r/query ses get-outcomes)] (println out) ses)
(serde-session ses)
(let [out (r/query ses get-outcomes)] (println out) ses))
(def grouped-rec-calls (group-by identity @rec-holder))
(def dupes (filter #(< 1 (count (val %))) grouped-rec-calls))
(-> dupes first val count)
4 "unique" Tokens with current implementation
With the rules/insert above I would expect exactly one Token to be propagated.
I am currently working on a fix to eliminate these duplicates.
[1] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability.clj#L228-L250)
[2] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability/fressian.clj#L348-L375)
[3] (https://github.com/cerner/clara-rules/blob/master/src/main/clojure/clara/rules/durability/fressian.clj#L355)
@EthanEChristian regarding your check above, I don't think that the function "identity" is actually what you wanted. It doesn't actually have anything to with object identity, but rather is just a function that returns its argument that is useful as a argument to functions that takes functions as arguments. [1] To recap, the core issue here is that the duplication of elements and tokens unnecessarily increases the size of the serialized sessions.
This issue has been resolved at #248
I don't see anything else to address here. @EthanEChristian do you agree that this issue can be closed?
@WilliamParker, I used identity because I wanted to group-by equivalent facts. So it gave me buckets of facts the are equivalent via =
, this was the simplest way i could think to group facts that contain the same item. This method doesn't take into account nodes that have actually unique copies of the Token but it can give me a count of how many times the Serde thought it saw a unique Token and from there we can see that the number that the Serde saw was higher than what i would expect.
As the fix has been merged, I think it should be fine to close this one.
Closing.