oracle-samples/clara-rules

[Durability] Clojure equality behavior not maintained through serde

EthanEChristian opened this issue · 2 comments

Given the following example:

(ns durability-issue
  (:require [clara.rules :as r]
            [clara.rules.compiler :as com]
            [clara.rules.durability :as dur]
            [clara.rules.durability.fressian :as fres]
            [clojure.set :as set]
            [clara.rules.engine :as eng])
  (:import [java.io ByteArrayOutputStream ByteArrayInputStream]))


(defprotocol CustomFactType
  (fact-type [this]))

(extend-protocol CustomFactType
  Object
  (fact-type [this] (type this)))

(defrecord AFactType [x])
(defrecord AFact [x]
  CustomFactType
  (fact-type [x]
    (->AFactType (:x x))))

(defrecord BFactType [x])
(defrecord BFact [x]
  CustomFactType
  (fact-type [x] (->BFactType (:x x))))

(defrecord AOutcome [x])
(defrecord BOutcome [x])

(r/defquery a-query
  []
  [?out <- AOutcome])

(r/defquery b-query
  []
  [?out <- BOutcome])

(defn gen-rules
  []
  (let [a-rule {:ns-name 'durability-issue
                :lhs [{:type (->AFactType :a), :constraints []}]
                :rhs `(do (r/insert! (->AOutcome :a)))
                :name "durability-issue/some-rule"}
        b-rule {:ns-name 'durability-issue
                :lhs [{:type (->BFactType :a), :constraints []}]
                :rhs `(do (r/insert! (->BOutcome :a)))
                :name "durability-issue/another-rule"}]
    [a-rule b-rule]))

(defn without-serde
  []
  (let [fired-session (-> (com/mk-session [(gen-rules) [a-query b-query] :fact-type-fn fact-type])
                          (r/insert (->AFact :a) (->BFact :a))
                          (r/fire-rules))]
    fired-session))

(defn with-serde
  []
  (let [session (com/mk-session [(gen-rules) [a-query b-query] :fact-type-fn fact-type])

        restored-session (with-open [baos (ByteArrayOutputStream.)]
                           (let [serde-impl (fres/create-session-serializer baos)]
                             (dur/serialize-rulebase session serde-impl)
                             (with-open [bais (ByteArrayInputStream. (.toByteArray baos))]
                               (let [serde-impl (fres/create-session-serializer bais)
                                     rulebase (dur/deserialize-rulebase serde-impl {:fact-type-fn fact-type})]
                                 (dur/assemble-restored-session rulebase {})))
                             ))
        fired-session (-> restored-session
                          (r/insert (->AFact :a) (->BFact :a))
                          (r/fire-rules))]
    fired-session))

(defn whats-going-on?
  []
  (let [session-without-serde (without-serde)
        session-with-serde (with-serde)

        query-a #(r/query % a-query)
        query-b #(r/query % b-query)

        a-difference? (not= (query-a session-with-serde) (query-a session-without-serde))
        b-difference? (not= (query-b session-with-serde) (query-b session-without-serde))]
    (when a-difference?
      (println "AQuery difference found"))

    (when b-difference?
      (println "BQuery difference found"))

    (when (or a-difference? b-difference?)
      (println (set/difference (-> session-without-serde
                                   eng/components
                                   :rulebase
                                   :alpha-roots
                                   keys
                                   set)
                               (-> session-with-serde
                                   eng/components
                                   :rulebase
                                   :alpha-roots
                                   keys
                                   set))))))

; AQuery difference found
; #{#durability_issue.BFactType{:x :a}}

The example above demonstrates that in certain instances when serde occurs entire alpha-roots can be lost. This scenario occurs due to clojure hashing vs. java hash, while

durability_issue.BFactType{:x :a}

and

durability_issue.AFactType{:x :a}

are different the java hash code is the same. This becomes an issue when we go through serialization, specifically when we leverage Fressian's cache on write mechanic. The cache relies on using java's hash code, meaning the key of key value pair would not be distinct in the example above. Therefore when deserializing a map, the second key would overlay the first.

The solution seems to be that we cannot leverage the caching mechanic unless we are 100% sure that there will never be clojure objects written.

I don't think the issue here is exactly the hashcode, or at least not entirely. The Fressian caching mechanism here uses a custom Fressian map implementation that appears to use the Java equality semantics rather than Clojure ones. These are different for records:

user=> (defrecord TestRecord1 [x])
user.TestRecord1
user=> (defrecord TestRecord2 [x])
user.TestRecord2
user=> (= (->TestRecord1 :hello) (->TestRecord2 :hello))
false
user=> (.equals (->TestRecord1 :hello) (->TestRecord2 :hello))

This is then reflected in the hashcodes, for which the contract is that if two objects are equal, then their hashcodes must be equal. This is discussed in the Java hashCode doc, noting that the expectations are the same for any hashcode/equality implementation combination since these expectations are the result of the needs of hash-based data structures rather than something Java-specific.

I agree that we need to at least limit the usage of the caching mechanism to cases where it is safe. I see multiple places where it is used, although it isn't used universally. I think a first step would be to assess what the impact of just disabling it in all relevant handlers is. If the performance impact of this is problematic we could look at adding caching in more specific cases where we know it is safe, but keeping in mind that there may be a speed/memory tradeoff here where further compressing the output requires more computation when serializing. I wouldn't be surprised though if the impact is minimal. The Fressian caching is distinct from the identity-based caching implemented in Clara and the latter would be unaffected.

@WilliamParker @EthanEChristian

I'll mentioned something here that is the underlying reason why this was overlooked original (by me).

The Fressian object caching happens earlier than expected when it comes to object type dispatch. Perhaps this is intuitive, but it wasn't for me. I'm still not sure I think it makes sense.

Take the writing of a clojure.lang.MapEntry for example.
In particular this line:

(.writeObject wtr (key o) true)

The Fressian cache is searched for (key o) and if it is found, no further recursive, type-specific write handlers are called.

Somewhat interestingly, if you look at these lines https://github.com/Datomic/fressian/blob/fressian-0.6.6/src/org/fressian/FressianWriter.java#L458

        WriteHandler w = writeHandlerLookup.requireWriteHandler(tag, o);
        doWrite(tag, o, w, cache);

the WriteHandler is fetched, but it is never used if the cache finds a hit.


This is quite unfortunate that Fressian won't use clj semantics for hash/eq with it's built-in caching. I think in this Fressian handler impl of Clara, we should just make a function that wraps any object about to be cached with a Object.hashCode and Object.equals impl that forwards through to clojure.core/= and clojure.core/hash.

Being defensive on a case-by-case basis seems error-prone to me since we have expectation of our types following the clj hash/equiv contracts, not the Java interop ones.

Perhaps some type like:

(defprotocol Unwrappable
  (upwrap [this]))

(deftype WriteWrapper [wrapped hashcode]
  Unwrappable
  (unwrap [_] wrapped)
  Object
  (hashCode [this]
    (hash this))
  (equals [this other]
    (= this other)))

This wrapped type may never need to be written itself, but instead have a write handler forward it's writes to it's underlying type, ie don't enable cache on that FressianWriter.writeObject call.
eg.

"clj/writewrapper"
{:class WriteWrapper
 :writer (reify WriteHandler
           (write [_ w o]
             (.writeObject w (unwrap o))))
 ;; NB: Will never be read.
 }

I'm not sure the reader has to be aware of this at all, but if there is a similar Java hash/eq issue there, it still may be needed. That'll have to be explored.