ekzhang/crepe

Implement aggregates

hydrolarus opened this issue · 2 comments

It would be nice to have aggregates in crepe. I am trying to do Advent of Code this year with as much crepe as I can and I ran into some situation where having them would have been nice.

The small set of aggregates that Soufflé has seems already really nice and powerful.

I am still learning to use Datalog, so I am very green when it comes to the implementation that crepe uses, but in general I would still be up to contribute this with some guidance. 🙃

That sounds interesting. What do you think would be a good syntax for aggregates?

crepe! {
    @input
    struct A(i32, i32);

    @output
    struct B(i32, i32);

    B(x, $syntax_for_max$ A(x, ?)) <- A(x, _);
}

Maybe something like the above? I think the only limitation is that it cannot be a valid expression syntax.

My initial idea was to allow them in the same place as relations as clauses but with lowercase identifiers.

  • A(x, y) would be a relation
  • (...) would be a guard expression
  • let ... would be a binding
  • sum(x) { <clause>,* } would be an aggregate

Special binding syntax

One issue with that would be though that it's hard to bind the result of that to a variable. However, since all those aggregates return numeric values (ignoring the range aggregate from Soufflé) there is no need for more complex pattern matching, so maybe n = sum(x) { <clause>,* } could work nicely?

Your example then could look like this:

crepe! {
    @input
    struct A(i32, i32);

    @output
    struct B(i32, i32);

    B(x, n) <- n = max(x) { A(x, _) };
}

Pro:

  • easy to fit into current syntax

Con:

  • Not immediately clear when to use n = .. and let n = ..

Out parameters

I am not super convinced about starting with an assignment because that might be confusing when to use let n = and when to use n = , so maybe a Prolog style "out parameter" could work too. The n = max(x) { A(x, _) } would become max(x, n) { A(x, _) }. That would be clear to parse but maybe not as clear to read, because max(x, n) almost looks like the max of x and n will be taken.

crepe! {
    @input
    struct A(i32, i32);

    @output
    struct B(i32, i32);

    B(x, n) <- max(x, n) { A(x, _) };
}

Pro:

  • easy to fit into current syntax

Con:

  • it might be unintuitive to have to use out parameters, especially when the name of the aggregate is the same for known mathematical functions

Special casing let rhs expressions

Maybe to keep the syntax "clean" the right hand side expression for let bindings could be special cased for aggregates.
If the { } are required then this could be used to still allow user functions called like aggregates.

// aggregate
let n = max(x) { A(x, _) },
// user defined functions `max`
let n = max(x),

Pro:

  • Same binding syntax for aggregate results and regular variable/pattern binding

Con:

  • more magic ✨
  • harder to parse, will introduce a special case to otherwise well-known syntax