oracle-samples/clara-rules

User-intelligible class names for anonymous rulebase functions

Closed this issue · 7 comments

Clojure anonymous functions are compiled into classes with names based on the namespace and the name in the fn body if one is given, but if there is no name in the fn body then the only user-intelligible part of the name is the namespace. Normally in a Clojure development workflow the class name of a function isn't important, but this class name is what shows up in a profiler snapshot when the function is invoked. When Clara's eval'ed rule network functions such as alpha node activations are performance hotspots it can thus be difficult to determine in a profiler what rules those function calls correspond to.

user=> (class (fn myfunction [a] (+ a 1)))
user$eval10502$myfunction__10503
user=> (class (fn [a] (+ a 1)))
user$eval10506$fn__10507

I propose that we give these functions names using node ID(s) as part of the name. After #201 I believe node-ids should be deterministic. If this is the case, a user could compare the node ID(s) derived from a function class name in a profiler snapshot whose VM is no longer running with a new instance of the rulebase built with the same commands in order to determine what conditions are causing performance problems.

This could be done for all the evaluated functions, although my particular need is mostly for such information on AlphaNode activations functions at this point. Note that this is blocked on giving node IDs to AlphaNode for which I have logged work at #237.

Just a thought, but could we encode information about the expression in the class name itself, so this could be useful in scenarios where we can't easily track down node ids? So if there's an alpha expression of (< temperature 32), we create a class name alpha__lt_temperature_32.

We'd have to encode everything to valid class names and might have to truncate long expressions, but this could give a good idea of performance simply by looking at the profiles or call stack samples.

It isn't clear to me how there could be encoded a meaningful name like "lt" for less than unless we are special casing certain known functions? Maybe I was misunderstanding part of that.

Overall on this issue, though, I will say that I've added function names to these inner-compiled functions in Clara's compiler before so that I could troubleshoot better. I believe I took an approach similar to what Will has said about using a node id.

@mrrodriguez We wouldn't need to special case those functions since Clojure does it for us, although I got the case wrong in my example. We just need to create a valid symbol that represents the expression. For instance:

user=> (class (fn > [a] (> a 1)))
user$eval15273$_GT___15274

Or in the above example, if we generate the symbol 'alpha__<_temperature_32', it shows up in the class name:

(class (fn alpha__<_temperature_32 [a] (> a 1)))
user$eval15317$alpha___LT__temperature_32__15318

I wouldn't object to the node ID if that is more helpful, but the above text in a stack trace or JVM profiler would be meaningful on its own.

Personally, I'd still like to have a node ID as that is more precise; in our use cases a truncated expression would provide some information but would probably still be ambiguous. We could add a helper function to retrieve the conditions and rules given a node ID in order to avoid forcing users to dig into the internals of Clara to understand the information a node ID provides. The functionality being added at #256 could be helpful for this. However, these options are not mutually exclusive. We could do something like "alpha_node_id_1_expression_truncated_expression_here".

I just think that some expressions may be hard to embed into one string.

Perhaps I'm wrong there. So I'd favor the node ID approach and a lookup helper function for
node ID -> expression as Will suggested.

With AOT compilation on some platforms, I recall that there were actual limitations to making class names too long too. However, I can't remember many of the details on that.

I'm fine with just using the node ID here, so we can go that way if you guys feel it's more useful.

I think this is complete after the changes in #451 so I'm going to close it for now but feel free to reopen it if there's anything else you want to do here @EthanEChristian .