oracle-samples/clara-rules

clara.rules.platform/group-by-seq used in engine may cause incorrect semantics

Opened this issue · 26 comments

clara.rules.platform/group-by-seq can cause a subtle defect when Clojure record are involved. More generally, the problem is that c.r.p/group-by-seq uses a java.util.LinkedHashMap to perform the grouping operation. This is a Java-based collection type and is only aware of Java based equality semantics, ie Object.equals() and Object.hashCode().

Clojure has it's own equality semantics taken into account in = and hash. There are some occasions where the difference between the Clojure vs Java equality semantics can cause functionally different behavior.

c.r.p/group-by-seq is used internally within the Clara engine for various performance-sensitive operations. The j.u.LinkedHashMap was chosen to be used within c.r.p/group-by-seq to streamline this performance-sensitive operation. a j.u.LinkedHashMap is convenient since it is mutable, for fast building of the ephemeral data structure needed. It also provides insert-order ordering determinism. This is useful to guarantee consistent execution times through the Clara engine from one JVM instance to the next, given the same input.

A concrete example of a problem with j.u.LinkedHashMap's use of Java-based equality semantics has came up with regard to Clojure record types.
Clojure record types use their type information in clojure.core/= comparisons and clojure.core/hash. However, they also are meant to interop with Java, that is unaware of their record types, and instead they pose as plain java.util.Map impl's from the Java interop perspective. This means that Clojure records Object.equals() and Object.hashCode() implementations do not take the type of records into account.

e.g.

(defrecord Foo [])
(defrecord Bar[])

(= (->Foo) (->Bar)) ;;= false
(.equals (->Foo) (->Bar)) ;;= true

This is a problem in the Clara engine. One example is that the engine it uses c.r.p/group-by-seq on :bindings of a rule. The :bindings may contained Clojure records. The Clojure records should be compared with Clojure equality semantics, but instead they are using Java semantics.

@thegeez in the #clara Slack channel gave a reproducing case that can show this defect manifest itself. I'll copy it here:

(defrecord ClauseOne [b])
(defrecord ClauseTwo [b])

(defrule rule-one
  [ClauseOne (= ?b b)]
  [ClauseTwo (= ?b b)]
 =>
  (println "Match found:" ?b )
  )

(defrecord WrapAlice [num])
(defrecord WrapBob [num])

(comment
  (clear-ns-productions!)
  (let [alice (->WrapAlice 1)
        bob (->WrapBob 1)]
    (println "(= alice bob)" (= alice bob))
    (println "(.equals alice bob)" (.equals alice bob))
    (-> (mk-session)
        (insert
         (->ClauseOne alice)
         ;; Line below is important, without it the wrong behavior does not surface
         (->ClauseTwo alice)
         (->ClauseTwo bob)
         )
        (fire-rules)
        ))
  ;;(= alice bob) false
  ;;(.equals alice bob) true
  ;; Match found: #clara_bug.core.WrapAlice{:num 1}
  ;; Match found: #clara_bug.core.WrapBob{:num 1} ;; <- should not happen
  )

To fix this, it seems likely that the j.u.LinkedHashMap should not be used. There aren't necessarily any Clojure equivalents that can capture the performance characteristics, and the order determinism that is desired here. Also, if there are, Clara will (and should) be reluctant to add more dependencies, due to the burden that places on consumers of the library who may use conflicting versions of those same dependencies (aka "jar hell").

Edit: The comments below mention a wrapper on the items in the j.u.LinkedHashMap and I think that likely is the better approach than what I originally said below.

My first thought would be to perhaps just consider a custom implementation of this function that had the correct behavior with respect to Clojure equality semantics and our ordering guarantees, yet retained comparable performance characteristics. This probably could be done with a combination of a Clojure transient collection and perhaps a separate structure for tracking the ordering of the incoming input. More thought would be needed.


Related issues to the background of this:

#141
#207

This bug was found and reported by dominicm (@SevereOverfl0w) in the #clara Slack channel

@WilliamParker pointed out to me that the clara.rules.compiler/AlphaRootsWrapper is meant to solve a similar problem. Perhaps that idea can be used here as well - ie wrapping the results of the group by fn in a type that impl's Java-based equality semantics in terms of clj still.

This would just require that the perf is tested to see if this extra object wrapping allocation/gc was acceptable for perf or not.

+1 to @mrrodriguez 's diagnosis, and I agree that this looks like a bug. We've used Java wrappers around Clojure types to avoid this issue in the past; see AlphaRootsWrapper in the compiler. Not addressing it in group-by-seq looks like a miss. It seems like a similar strategy would work here, provided that the performance impacts are acceptable. This is a hotspot, so it is possible that they would be problematic, but allocation and GC is something that is very heavily optimized by the JVM so it would likely be OK. It is possible that an implementation would need some tweaking to take maximum advantage of such optimizations.

Some options to explore for improving the performance of a wrapper would include:

  • Adjusting code to take best advantage of JVM optimizations, possibly by writing it in pure Java if needed to hit specific cases.
  • Using an object pool for the wrappers to prevent the repeated allocation and GC cost.
  • Explore off-heap memory options for the wrapper. Not something I'd be thrilled about.. but might still be better than writing a collection implementation.

However, if the simple approach of just adding a wrapper doesn't degrade performance too much I'd suggest just doing that.

Agreed that simply wrapping this with something like AlphaRootsWrapper is our best bet here. Note that the underlying LinkedHashMap implementation already allocates at least one object per map entry, so we're not adding a significant allocation source where the previous was none. And as Will points out the JVM is very optimized for this.

Anyone working on this one?

Sorry for being slow here. This is on my to-do list, but unfortunately between traveling and other demands I haven't gotten back to it. I'll see if I can pick it up within the next week...I don't expect it to be a significant change.

The immediate issue is fixed in PR #397 here.

I am wondering if this goes far enough, though. For instance, the tuned-group-by function would be prone to a similar issue, although that is in a hot code path and the current issue would only be a problem if the fact type itself is a Clojure record or map. I don't mind merging this fix and we can consider whether to go after these other cases as well, since the current PR is a step forward if nothing else.

@rbrush

I am wondering if this goes far enough, though. For instance, the tuned-group-by function would be prone to a similar issue, although that is in a hot code path and the current issue would only be a problem if the fact type itself is a Clojure record or map. I don't mind merging this fix and we can consider whether to go after these other cases as well, since the current PR is a step forward if nothing else.

It may make sense to do them as 2 separate issues. The tuned-group-by one seems to more of a stretch case. It is valuable to consider fixing though. Having as 2 separate changes would make it easier to understand the effect of each one in isolation (for example if perf changes unexpectedly somewhere or rule results change (since a defect is now fixed)).

Alright, I went ahead and merged the immediate fix and logged #398 to consider the tuned-group-by -- which I agree isn't an immediate need.

I'm also happy to push a fix release for this if that would be helpful.

@rbrush
Just curious or maybe looking to help start a discussion:

  1. What's the release plan for #397? Major version bump?
  2. Any value in releasing an alpha or something to allow feedback? Seems like it could affect performance and/or rule semantics significantly?

I'm fine with anything -- only wondering how you would approach releasing this change :)

@alex-dixon I agree that an alpha release is probably warranted here to make it easy to try out. There doesn't seem to be a measurable impact on complex rules (since this isn't the dominant time sink), but a trivial (and probably non-representative) microbenchmark test I have that inserts and queries lots of facts repeatedly does show about a 20% hit.

These microbenchmarks can be misleading since this 20% penalty may only apply to 1% of the execution time of a realistic rule engine...but it raises enough concern that an alpha preview seems justified.

@rbrush is this microbenchmark something you can post? Maybe it is something we should add to the performance benchmarking build @EthanEChristian is adding with #396. Did you find any details on what is taking up the extra time?

I've thought of some possible optimizations that would be relatively low-hanging fruit.

  • Perhaps the instance check in the equals method isn't quite as cheap as expected and we should remove it? It isn't strictly necessary as discussed at #397 (comment)
  • If the only divergence between Clojure and Java equality semantics is record equality we could avoid wrapping non-record types. Records will only ever be equal (in a Clojure sense) to an object of the same type; see their equiv implementation . So you could have a wrapper along the lines of the following:
(deftype ClojureRecordWrapper [record]
     (equals [this other]
          (and (instance? IRecord other)
                   (.equiv ^IPersistentCollection record ^IPersistentCollection (.record ^ClojureRecordWrapper other))

I've seen the type-based dispatch and conditionals of clojure.core/= cause performance problems before, probably because they confused the JIT compiler. If we know the type in question we can directly link to the equiv method though. The tradeoff of this would be the need to have instance checks when wrapping and unwrapping, but those might be less significant and in cases where the values aren't ever actually records might be optimized away anyway by the JVM since to my understanding sometimes it will make such assumptions that a path won't happen when such is observed at runtime but have the ability to "rewind" if it ever does.

@WilliamParker The one I found that shows the most change is the old clara-benchmark project: https://github.com/rbrush/clara-benchmark. The most pronounced benchmark is the clara.simple-query-large, when run like this (and modifying the project.clj in clara-benchmark to try different versions):

lein run clara.simple-query-large

Your idea of additional type checks here is interesting, and I might give it a shot under this workload.

@WilliamParker @rbrush

Perhaps the instance check in the equals method isn't quite as cheap as expected and we should remove it? It isn't strictly necessary as discussed at #397 (comment)

I doubt that instance? is causing any noticeable time. Modern JVM's have heavy optimizations with respect to getting the class of an object and doing instance checks. However, we can remove it when it isn't necessary, just to eliminate operations since the direction this is going is a more specialized wrapper for speed.

If the only divergence between Clojure and Java equality semantics is record equality we could avoid wrapping non-record types. Records will only ever be equal (in a Clojure sense) to an object of the same type; see their equiv implementation . So you could have a wrapper along the lines of the following:

I don't know that it is safe to "only wrap records". We want deep-clj equality checking on the data structures. It isn't only the top-level object that we care about.
Example, we are wrapping something like

{:x user.MyRec{:y 10}}

I agree that clojure.core/= can be time sink. This is for the same reasons as other clojure.lang.RT driven functions - they have a lot of instance-checking branches and perhaps JVM inlining suffers at times.

If we were able to directly target the equiv method of the appropriate interface, I think the results would likely be an improvement. However, the difficulty is in that the callsite isn't going to be a straightforward as clojure.lang.IRecord, as I showed in the above snippet. IPeristentCollection may be a reasonable catch for many cases though.

I believe that the objects in this hot loop will all be the same type of object during the loop. So the interface will be a monomorphic callsite. However, repeated calls to group-by-seq may invalidate the site optimizations after each call (if the incoming type changes, which may not actually be that common).

so perhaps:

(if (instance?  IPersistentCollection record) 
  (.equiv ^IPersistentCollection record ^IPersistentCollection (.record ^ClojureRecordWrapper other))
  (= record (.record ^ClojureRecordWrapper other)))

This is assuming that the equals() comparison is on the same type of objects always (as we have discussed with removing the instance? check.

p.s. the hash caching still seems like a good idea too. hash has similar interfaces that could be targeted if that plan works out.

@mrrodriguez I hadn't considered the case of records inside collections as in your example

{:x user.MyRec{:y 10}}

I agree that is a problem that restricts our optimization opportunities.

On the remaining optimization opportunities, I think we only need to know the type that is wrapped by the object on which equals is called to direct link potentially. Something like

(deftype SingleTypeOptimizedWrapper [value]
    (equals [this other]
           (.equiv value (.get_wrapped other)))
    Wrapper
     (get-wrapped [this] value)))

We'd need to make completely sure we were hitting the correct method though and consider if it is something likely to change in Clojure if applicable. This is getting fairly involved - I don't know that we should block releasing the alpha preview on it.

Unfortunately I don't see a clear way to eliminate the overhead here. As a test I just had the wrapper call the Java .equals method to separate the cost of creating the wrapper from that of Clojure equals semantics. It turns out just having a wrapper class accounts for about 60% of the overhead in my tests -- presumably because of the indirection and GC overhead that is being exposed in the microbenchmark. Using (= ...) instead of Java's (.equals ....) accounts for the rest.

In any case, it's not clear this will significantly impact any real workloads, so I'm okay with doing this as an alpha/preview release to let people easily try it out. If this is a critical path, we might look at creating a more specialized implementation of group-by-seq using a bespoke data structure....although that would be more involved.

@rbrush I agree that this shouldn't be holding back this alpha/preview release.

I had one idea to maybe cut down on the new object allocations, but I think it'd only be useful if the common case is (.get m wrapper) exists already on iterations of reduce. The idea is the ( dreaded) object reuse pattern. Also, to make it work right, I am using a mutable coll as the map value so I don't have to actually know the "real key object" used in the map. If that makes sense.

I haven't tested this at all, just showing it in case it looks interesting to explore to anyone else.

Here is the outline:

(defprotocol SetWrapper
  (set-wrapped! [this wrapped wrapped-hash])) 

(deftype JavaEqualityWrapper [^:unsynchronized-mutable wrapped
                              ^int ^:unsynchronized-mutable hash-code]
  SetWrapper
  (set-wrapped! [this new-wrapped wrapped-hash]
    (set! wrapped new-wrapped)
    (set! hash-code wrapped-hash))

  Object
  (equals [this other]
    (and (instance? JavaEqualityWrapper other)
         (= wrapped (.wrapped ^JavaEqualityWrapper other))))

  (hashCode [this]
    hash-code))


(defn group-by-seq
  [f coll]
  (let [test-wrapper (JavaEqualityWrapper. nil nil)
        ^java.util.Map m (reduce (fn [^java.util.Map m x]
                                   (let [k (f x)
                                         k-hash (hash k)
                                         tester (set-wrapped! test-wrapper k k-hash)
                                         existing (.get m tester)
                                         ^java.util.ArrayList xs (or existing
                                                (java.util.ArrayList.))]
                                     (.add xs x)
                                     
                                     (if existing
                                       m
                                       (.put m (JavaEqualityWrapper. k k-hash)
                                             xs)))
                                   m)
                                 (java.util.LinkedHashMap.)
                                 coll)
        it (.iterator (.entrySet m))]
    ;; Explicitly iterate over a Java iterator in order to avoid running into issues as
    ;; discussed in http://dev.clojure.org/jira/browse/CLJ-1738
    (loop [coll (transient [])]
      (if (.hasNext it)
        (let [^java.util.Map$Entry e (.next it)]
          (recur (conj! coll [(.wrapped ^JavaEqualityWrapper (.getKey e)) (vec (.getValue e))])))
        (persistent! coll)))))

Note:

  1. The test-wrapper is not safe to use as an actual key in the map since it is reused on each iteration. So when a new map entry needs to be made, a new wrapped object must be used.

  2. The map entry val is an j.u.ArrayList instead of a transient because Java map's do not have efficient enough ways to get the whole map entry out at once to reuse the same key object. If there is an existing collection, it can just be mutated without needing to know the actual object key.

  3. Due to (2), there is addiitional cost to do (vec (.getValue e)) to turn the mutable collection into a persistent. transient + persistent! may have been faster here.

  4. I use a j.u.ArrayList since it is more optimized by the Clojure persistent vector factory method for making a new vector (via vec). More of this overhead could be removed by going straight to the underlying Clojure Java interop, but that is more brittle across clj versions.

I tried a few variations of this against the clara.simple-query-large benchmark. Interestingly enough, the most successful was calling .hashCode instead of clojure.core/hash when building the wrapper.

(JavaEqualityWrapper. k (.hashCode ^Object k))

I realize that this isn't null-safe and would need to be adjusted for real use. EDIT: I also haven't investigated how using Java hashcode semantics with Clojure equality semantics would work or if it would be safe.

This reduced the overhead to no more than 5% (and maybe just random noise, I'd need to test more) on a couple runs I did. I would want to validate this more extensively, and I haven't tracked down what exactly about the hashing is problematic. I also haven't tried any of the other benchmarks yet.

@WilliamParker I don't think that'd be safe in a general case.

Clojure = and hash need to be used together to guarantee the "hash/equals" contract that is expected. equals and hashCode go with each other similarly. You can't mix the two. There are cases where it won't be consistent.

I believe that one of the reasons that Object.hashCode is often faster than Clojure hash is due to Clojure using mumur3 for strings. I don't remember all the details, but it is meant to help prevent certain attacks on hash maps with string keys I believe by making it harder to produce many different strings that have the same hash. Several languages made a similar change around the time Clojure changed this.

I saw the same results @WilliamParker saw when switching to .hashCode. Nice find! I hadn't thought that would be the most significant item.

@mrrodriguez I agree with your comments in the general case, but I think this is safe in this specific case. The only requirement for using this in the JavaHashMap is that if two objects are Clojure =, they will produce the same Java .hashCode. I think this is the case. We will see more hash collisions using Java .hashCode, but the HashMap itself will still behave correctly.

user=> (defrecord Foo [a b])
user.Foo
user=> (defrecord Bar [a b])
user.Bar
user=> (= (->Foo 20 30) (->Bar 20 30))
false
user=> (= (.hashCode (->Foo 20 30)) (.hashCode (->Bar 20 30)))
true ;; a false collision, but the following equality check will resolve this in the hash map.
user=> (= (hash (->Foo 20 30)) (hash (->Bar 20 30)))
false

Note this is not true if using Clojure == and Java .hashCode, since it handles numeric conversions and items with different hash values can be ==, but that doesn't apply here.

All unit test pass with the Java .hashCode, if nothing else. This seems promising to me, but please let me know if I'm missing something.

@rbrush
Numeric handling was my biggest concern. You are right that == would definitely be problematic.

I can't think of any specific examples. Numerics with = do have a "category" check that seems to assert that the numbers have the same type first. So only numbers of the same type are equiv and those will also hash equal via hashCode.

I suppose it is ok. I feel on edge about it thinking that there are corner cases missing, but I don't know of any. Your assessment that

The only requirement for using this in the JavaHashMap is that if two objects are Clojure =, they will produce the same Java .hashCode. I think this is the case. We will see more hash collisions using Java .hashCode, but the HashMap itself will still behave correctly.

seems accurate.

Alright, I took a pass at this and created PR #399 for comment. This brings the penalty at least into the mid single digits for this microbenchmark, and likely negligible for most real workloads.

As an experiment I also tried a version where JavaEqualityWrapper was written in Java, but found no measurable difference there.

I think this change will be OK as long as the following holds: "If two objects are Clojure =, then they are Java .equals". In theory one could create a type of ones own for which this didn't hold true, although I don't know why one would do so. IMO this is an OK thing to do for now - better than either the existing bug or introducing a significant performance hit.

If it is a problem, it might be possible to avoid it in many cases by having something other than a Clojure persistent data structure as the key. The usage of this function use bindings maps as the key to group by, so we'd have to use a Java map there (in Clojure only of course) or define our own type. e.g.

(deftype Bindings
        MapInterfaces ......
         Object
         (hashCode [this] optimized-impl-here...)

You could still use the Clojure hash for the values while avoiding the extra cost for the binding structure, and avoid the cost when the values are primitives etc.

Regarding the reason for the different Clojure hash, my (incomplete) understanding is that the motivation was to reduce hash collisions. Since this is a very temporary map, as a practical matter I expect we could reduce them by just allocating a larger map size upfront since it would be garbage collected shortly anyway.

@WilliamParker

You could still use the Clojure hash for the values while avoiding the extra cost for the binding structure, and avoid the cost when the values are primitives etc.

I don't know that I see how this would help. I'm not sure the outer-most object is the cause of the hash performance cost. My assumption at least has been the larger data structures in the binding maps are in the values of the map. Anyways, it's a future consideration anyways.

Regarding the reason for the different Clojure hash, my (incomplete) understanding is that the motivation was to reduce hash collisions. Since this is a very temporary map, as a practical matter I expect we could reduce them by just allocating a larger map size upfront since it would be garbage collected shortly anyway.

I believe that the primary reason that Clojure used mumur3 for strings to prevent "Denial of Service attacks through forced hash collisions". This was a hot topic that came up a few years ago that affected many main stream programming languages. Java has not changed the String.hashCode to account for it. I believe this is due to backwards compatibility concerns (and probably performance considerations too).

The reason that Clojure has a separate "hasheq" set of interfaces and contracts, in general, is not about collisions. It is about having different interpretations of equivalence across types (for example, collections), than what Java has. Also, there are other nuances there, such as clj records having their type considered in clj equality, but not in Java equality (meant to serve a generic interop purpose).

@mrrodriguez There were issues with hashcode collisions on Clojure's highly nested data structures; see this design doc. Also see the map hasheq implementation; the differences aren't just in the primitives. So I actually do think it was in large part about collisions. As for whether the perf cost is coming from the primitive hashing or the larger data structures, I haven't done any testing to tease out where the cost exactly is. If the cost is in the primitives it would indeed be harder to avoid using the strategy I described of using a type other than a Clojure data structure to group by; that's a good point. The benchmark referred to here is using primitive types for the fields in its generator. In practice, I suspect that running into the issues described in the Clojure hash design doc would be abnormal although possible for idiomatic Clara rules.

@WilliamParker
Thanks for the link.

I was referring to string hashing though. String isn’t mentioned in that design from what I can see. The doc has good info about other stuff relevant here though.

String is where I’ve seen the biggest different in perf before. Since using a string key is common in things like JSON and Java and also is used behind keywords and symbol I believe. However some of those cache it.