cerner/clara-rules

Allow qualified keyword names for rules and queries

sparkofreason opened this issue · 17 comments

Third-part DSLs can benefit from having production names specified as qualified keywords rather than symbols, e.g. allows query names to be easily correlated with specs defining the query results. The current code assumes that a production name will take the namespace where it is defined, which won't generally be the case using qualified keywords. The changes are pretty minor, see example from my working fork: Provisdom@6dd773b

If possible, I'd like to just include this change as part of PR #365.

This sounds reasonable to me.

Necessary changes

  • The RHS compilation uses the :ns-name, not the namespace of the :name, in both Clojure and ClojureScript. Note that as discussed in this comment on Issue 178 similar logic was not applied to the LHS. Therefore this qualification logic shouldn't be impacted by the proposed change.
  • This logic for naming extracted rules that was added to solve issue 149 would need to be updated to accept a keyword name and create some reasonably new name.
  • Schemas for :name will need to be updated to be either a keyword or string instead of just a string.
  • If a user defines rules using defrule and defquery the fact that Clojure replaces duplicate var definitions makes it less likely to have multiple productions with the same name, although still possible since the rules are just data. Having qualified keywords would probably make it easier to have such duplicates. I'd suggest adding enforcement of name uniqueness on a per-session level in the compiler if we make this proposed change. An exception would be thrown if duplicate names were used.

User experience

  • End users who just use Clara's defrule shouldn't be impacted by this. Clara's defrule and defmacro will continue to produce the same outputs, along with everything that consumes them. We should make sure to document appropriately that the :name will only be a keyword if the user is doing something beyond Clara's out-of-the-box functionality so that users don't think they have to update any data processing code they have to accept both strings and keywords if their use-case won't create keyword names.
  • The idea of using a qualified keyword as an identifier is very idiomatic in Clojure so the idea should be familiar to users.

Thoughts anyone?

On second thought after reading @WilliamParker’s analysis there’s a bit of weirdness. In short, what’s in a name?

@sparkofreason could you show maybe with examples the particular features you’re looking to support?

@WilliamParker

  1. Internally would this entail no change to the current ns-name key in the rule definition map? It’d still be the ns in which the rule is defined?
  2. I can’t rember if the name key is fully qualified in some cases.
  3. If the ns keyword uses an alias do we want to / are we able to resolve it fully?
  4. What do the rule vars end up being called?

@alex-dixon Here's an example from the DSL I'm working with:

(defrules rules
  [::active-count!
   [?count <- (acc/count) :from [::specs/Todo (= done false)]]
   =>
   (view/update-view :active-count ?count)
   (view/update-view :all-completed (= 0 ?count))]

  [::completed-count!
   [?count <- (acc/count) :from [::specs/Todo (= done true)]]
   =>
   (view/update-view :completed-count ?count)
   (view/update-view :show-clear (not= 0 ?count))]

  [::visible-todos!
   [::specs/Visibility (= ?visibility visibility)]
   [?todos <- (acc/all) :from [::specs/Todo (condp = ?visibility
                                              :active (= done false)
                                              :completed (= done true)
                                              :all true)]]
   =>
   (view/update-view :todo-list (sort-by ::specs/id ?todos))]

  [::visibility!
   [::specs/Visibility (= ?visibility visibility)]
   =>
   (view/update-view :visibility ?visibility)])

;;; Queries
(defqueries queries
  [::completed-todos [] [?todo <- ::specs/Todo (= done true)]]

  [::todo-by-id [:?id] [?todo <- ::specs/Todo (= ?id id)]]

  [::visibility [] [?visibility <- ::specs/Visibility]])

The main feature here is defining rules in a named group that isn't a namespace, and once you're not defining rules with the same idiom as a ns-level def, keywords seem like the idiomatic choice for names. It doesn't matter as much for rules as queries, since the common use-case for referring productions by name is query, a la (rules/query session ::completed-todos). Keyword names for queries also allow for definition of specs for query results which have the same name as the query. And you potentially get some help with query names from the IDE.

@WilliamParker Regarding logic for naming extracted rules: does anything need to change? If (:name production) is :keyword, then applying str to that just results in ":keyword", which I think is fine with everything downstream. I suppose you could the whole generated name back to a keyword, just for consistency when viewing trace outputs etc.

@sparkofreason yeah, you're right that it doesn't necessarily have to change. It just seemed odd to me to have a name that was a string representation of a keyword based off a keyword-named rule. IMO it would be a little less jarring to have an appropriately named keyword generated, but others might disagree. It isn't really a significant point either way.

@alex-dixon : I'd envision the :ns-name being the source of truth for what namespace things in a rule or query are evaluated in. IMO the :name can just be a unique identifier, that to date happens to be a string corresponding to a Clojure symbol most of the time because Clara's native DSL creates vars. However, there don't necessarily need to be vars for rules or queries. Even currently it is absolutely possible to create rule data structures directly and pass them into mk-session in Clojure; Cerner has done exactly that extensively with various layers on top of Clara for different purposes. Does that address your questions?

Any objections to making this change?

@rbrush @mrrodriguez

No objections here!

Seems fine to me.

Even currently it is absolutely possible to create rule data structures directly and pass them into mk-session in Clojure; Cerner has done exactly that extensively with various layers on top of Clara for different purposes. Does that address your questions?
I think I like the varless, data only approach better overall. To date in CLJS though I don’t think this was an option.

I think if @sparkofreason has this all working then we should be good. I just got dizzy thinking about ramifications in CLJS. Until recently I think productions required vars. Also that namespaced keywords might give weird rule names that might not resolve for tooling like rule reloading etc.

If I can come up with something more concrete this weekend I’ll post it here. Overall I trust everyone else on there being no major ramifications.

Basic functionality is working, but the changes depend on #365, so that will need to be merged first.

Currently looking at checking the uniqueness of production names, and also would like to include some tests. This would be easiest just directly using the data form of productions, rather than going through a DSL. Are there existing tests which just build a session from the map-form of productions?

Also, clara.rules.compiler/mk-session contains logic which appears ensure that only one rule with a given name exists, taking the one defined "latest". So testing for name uniqueness here doesn't appear to make sense. It may be that the place a custom DSL would hook should be clara.rules.compiler/mk-session*. My DSL, for example, currently calls mk-session, but only because I wanted to pick up the processing around ::rule-load-rder (without this, I get a NPE in Clojure). The behavior of taking the latest version of a production is arguably a DSL-specific behavior, while ::rule-load-order is definitely required for all DSLs. I would suggest moving ::rule-load-rder to mk-session* or having a separate function which applies that transform so external DSL's can get this functionality. The name uniqueness test should perhaps also be moved to mk-session*.

Coming up with a test for Clojure is straightforward, but more difficult for CLJS. Suggestions welcome.

I think I'll punt on the CLJS test for this. The affected code for CLJ and CLJS is the same, and the CLJS test almost requires its own DSL to be devised in order to get a keyword name into the process. For CLJ, I can just use a var with the data structure form of the production, but that won't work with CLJS.

@sparkofreason I'm just starting to look over this and the PR for this, and I may have more questions as I have time to go through them. For the moment though - is there a particular reason you want to avoid calling clara.rules.compiler/mk-session in favor of clara.rules.compiler/mk-session*? I've always thought of the latter as semi-private, although admittedly it isn't marked as such.

Regarding "The behavior of taking the latest version of a production" that is implemented here - note that it actually looks to see if the entire production is in the set, not whether anything with the name is. The decision to eliminate duplicates was made on issue 157; you can see the reasoning there. In terms of the ::rule-loader-order, this was actually done as a performance optimization and way of increasing determinism; see issue 192. Basically we use the order in which rules are provided to the compiler as a heuristic for which rules in a batch of equal salience should be fired first. My experience is that people tend to write things in a logical "top-to-bottom" manner so this seemed as good a heuristic as any. Establishing some deterministic order was also very helpful for dealing with performance problems. At one point runtimes for the same benchmark could be wildly inconsistent between different JVM instances, seemingly randomly taking anywhere from a few seconds to a few minutes for particularly problematic cases. IMO these kinds of esoteric performance details about the engine don't seem like something a DSL writer would necessarily want to be concerned with. Note that duplicate removal and the rule-load-order pieces are separate concerns.

@WilliamParker Looking at my DSL in its current state, I can't come up with a great reason to call mk-session* instead of mk-session. I don't remember why I thought that was important before, and whatever it was seems to have evolved away or something. It's slightly more convenient, I suppose, since I already have the arguments in the right form for mk-session*. But de-duping productions and adding ::rule-load-order seem like things all DSLs would take advantage of, not to mention the sesssion caching. So have mk-session be the entry point perhaps is more sensible.

I've merged #389 to fix this. I think we've settled all the outstanding issues around this now. Unfortunately GitHub appears to have added me as the commit author on that when I squash merged it - I'm not sure why that happened, maybe some sort of logic around "merge by the creator of a PR" that just ignores the author on the actual commit in the PR, which I set to @sparkofreason . Sorry about that.

Is there anything else that anyone wanted to cover here before closing the issue?

Closing, we can reopen if anyone disagrees with doing so.