oracle-samples/clara-rules

Add median accumulator to default accumulators

Opened this issue · 1 comments

Clara already defines mathematical accumulators, such as min and max. Per an offline conversation with a fellow member of the community, they mentioned that it might be nice to also support a median accumulator.

This seems doable, there are a few behavioral details that deserve some thought:

  1. The accum should allow facts to be returned, similar to min/max
    1.a. In the event that there are even amounts of facts what should be returned? Both facts?
    1.b. Should we allow a comparator function to be provided?
  2. In the event that we have an even amount of items, what value should be returned?
    2.a. I think that a simple average would be fine here

@WilliamParker I know you were recently working in this area of code, what are your thoughts?

This seems OK to add to Clara to me. New accumulators don't add additional complexity to the core API; they really are just "plugins" of a sort.

Regarding 1.1.a

I'd suggest that it would be easiest to understand if the accumulator always returns the same type signature. For example, given the condition

[?median-temp <- (acc/median :temperature :returns-fact true) :from [Temperature]]

Given cases

(A)

(->Temperature 10 "LHR")
(->Temperature 14 "LHR")
(->Temperature 16 "LHR")
(->Temperature 20 "LHR")

and (B)

(->Temperature 10 "LHR")
(->Temperature 15 "LHR")
(->Temperature 20 "LHR")

I'd suggest that a :returning-facts true option return one of

[(->Temperature 14 "LHR") (->Temperature 16 "LHR")] and [(->Temperature 15 "LHR")]
(->Temperature 14 "LHR") and (->Temperature 15 "LHR")
(->Temperature 16 "LHR") and (->Temperature 15 "LHR")

Inconsistency is likely to lead to bugs in client code when any tests written don't cover all the type signature possibilities IMO.

Regarding 1.1.b

Do you mean something likely

(defrecord LetterFact [letter])
[?median-letter (acc/median :letter :comparator alphabetical-sort-fn) :from [LetterFact]]

That would return "B" from

[(->LetterFact "A") (->LetterFact "B") (->LetterFact "C")]

This wouldn't be unique to a median accumulator, but would potentially be applicable to min/max as well. I could see uses, but I'd suggest deferring such an addition until/unless there's a need for it. If we do want it it might be best to add it to all of min, max, and median accumulators to maintain consistency.

Regarding 2.2.a

In the same vein as above, I think it would be best to maintain consistency on the returned type signature. I'd prefer to choose either the lower/upper valued fact of the "median pair" to be returned, potentially with the ability to specify which to do. Something likely

[?median (acc/median :returns-fact true :even-numbered-fact :return-upper)]
;; or 
[?median (acc/median :returns-fact true :even-numbered-fact :return-lower)]

It might even make sense to fail if the user passes :returns-fact true and didn't specify that option (with an easily-understood error message), or perhaps just have an arbitrary default.

However, this raises the question - is a :returns-fact true option needed? It might make sense to create this first and then make these decisions with the benefit of additional experience using the accumulator if a :returns-fact true option isn't needed at the moment.