oracle-samples/clara-rules

Variable Binding Scope

Closed this issue · 6 comments

I'm attempting to understand how variable binding works within rules, and experiencing some unexpected behavior. For example in the code below ?b disappears when printed with ?a

(require '[clara.rules :as rule])

(rule/defrule wat
  [1 (= ?a "a")]
  [1 (= ?b "b")
   (nil? (println ?a))    ;; => "a"
   (nil? (println ?b))    ;; => "b"
   (nil? (println ?a ?b)) ;; => "a nil"
   ] => nil)

(-> (rule/mk-session [wat] :fact-type-fn (constantly 1))
    (rule/insert 1)
    (rule/fire-rules))

Thanks for reporting this.

This looks like a bug for a the specific case where if a variable is bound in a constraint, and then later used in that constraint in such a way that the compiler thinks the bound variable is used in a join that cannot be hashed joined, it's not set in the resulting code that gets compiled.

This is something of an unusual case, since usually we don't reuse a variable bound in a given constraint later in that constraint (since the content to which that variable was bound is visible there). However, we should still support it because this behavior is definitely surprising.

Here's the quick debugging I did to see what's going on here. The pertinent area is the :join-filter-expression in node 2. Notice the (println ?a ?b) was moved to the join-filter-expression (since it may be attempting a join between ?a and ?b), whereas the simple (printlin ?b) remains in the original expression (since there is no other variable in there to make the compiler think it's joining to something else.)

Anyway, it seems like we'll have to dig into the logic for that non-hash join node to make sure previously bound variables are visible to the join operation as well.

(-> [wat] clara.rules.compiler/to-beta-graph clojure.pprint/pprint)

{:forward-edges {0 #{1}, 1 #{2}, 2 #{3}},
 :backward-edges {1 #{0}, 2 #{1}, 3 #{2}},
 :id-to-condition-node
 {0 :clara.rules.compiler/root-condition,
  1
  {:node-type :join,
   :condition {:type 1, :constraints [(= ?a "a")]},
   :new-bindings #{:?a},
   :used-bindings #{:?a},
   :join-bindings #{}},
  2
  {:node-type :join,
   :condition
   {:type 1,
    :constraints [(= ?b "b") (nil? (println ?b))],
    :original-constraints
    [(= ?b "b")
     (nil? (println ?a))
     (nil? (println ?b))
     (nil? (println ?a ?b))]},
   :new-bindings #{:?b},
   :used-bindings #{:?b :?a},
   :join-bindings #{},
   :join-filter-expressions
   {:type 1,
    :constraints [(nil? (println ?a)) (nil? (println ?a ?b))]},
   :join-filter-join-bindings #{:?a}}},
 :id-to-production-node
 {3
  {:node-type :production,
   :production
   {:ns-name user,
    :lhs
    [{:type 1, :constraints [(= ?a "a")]}
     {:type 1,
      :constraints
      [(= ?b "b")
       (nil? (println ?a))
       (nil? (println ?b))
       (nil? (println ?a ?b))]}],
    :rhs (do nil),
    :name "user/wat"},
   :bindings #{:?b :?a}}},
 :id-to-new-bindings {1 #{:?a}, 2 #{:?b}, 3 #{}}}

I believe I see the origin of this issue. The code for the function that joins the facts matching the second condition with facts from the first condition is created by compile-join-filter. When building that join function, variables-as-keywords will get all the binding keys used in the constraints. It is then assumed here that all of these bindings are populated in the token, which holds facts and bindings from previous conditions. In other words, it assumes that in the rule

(rule/defrule wat
  [1 (= ?a "a")]
  [1 (= ?b "b")
   (nil? (println ?a))    ;; => "a"
   (nil? (println ?b))    ;; => "b"
   (nil? (println ?a ?b)) ;; => "a nil"
   ] => nil)

the ?b binding is populated in the first condition. Since this is not in fact the case, the token bindings map does not contain a value for ?b and when Clara looks up the value of ?b in that map nil is returned. The constraints on the second conditions that are not join constraints, on the other hand, will be used to create a filter function that will be only called on the facts for the second condition before doing a join using compile-condition. Since the binding is created for the first time there, a let binding will created here when the binding is created that is then picked up by additional constraints in that filter function.

When a binding comes from this filtering on non-join constraints rather than a previous condition, compile-join-filter needs to create a let binding for it accordingly. I believe this will involve changing the join filter to take both a token and an element, rather than a token and a fact as currently. Everywhere that calls the join filter should have the element, not just the fact; see here for an example of usage in the engine. The element will contain the bindings from that first filtering pass. The join filter will then need to use the token to let-bind bindings from previous conditions and the element to let-bind bindings created in non-join constraints from the same condition. This is a bit trickier, but I think the :id-to-new-bindings field in the BetaGraph that I added for #102 provides the necessary information. I'd need to think on that more to be sure though; obviously we don't want to make a mistake that causes us to start trying to look up bindings that are actually on the token on the element instead. We could also resolve the issue at runtime by trying to find the bindings on the element and then looking them up on the token if they aren't on the element, but the performance cost of this could add up and since the necessary information is present at session compilation time I'd rather take care of it then than incur a runtime cost every time we use the session created.

@WilliamParker Nice analysis. I agree that using the element (rather than just the fact), and that we should do this at compile time by changing how we generate the let expression (rather than trying to sort it out at runtime).

It looks like we can get what we need from :id-to-new-bindings, or perhaps we can use the :new-bindings field in the join node to access this?

I think it will be safe to simply add these bindings to our let statement from the element. There shouldn't be a collision between new bindings and the contents of the token. (If there was, they wouldn't be new bindings, unless I'm missing something here.)

Regarding:

"I think it will be safe to simply add these bindings to our let statement from the element. There shouldn't be a collision between new bindings and the contents of the token. (If there was, they wouldn't be new bindings, unless I'm missing something here.)"

I agree, we'll just want to be sure we aren't both missing something before merging a change implementing that. :)

Finally got back to this, with pull request #273 addressing this bug. It's basically inline with the discussion @WilliamParker and I had on this, where we're passing in the new, local bindings to this expression and making sure their visible. The added test shows the simplest form of this issue, and failed prior to applying these changes.

This has been merged. I labelled it with 0.14.0, since it will be included in that release. (I don't think we'll have a 0.13.x fix release, but can change the label if we do.)

Closing this now since the issue is addressed, but we can reopen if I'm missing something.