Make [Z] and friends DWIM

Question

Make [Z] and friends DWIM

Opened this issue 2 months ago · 2 comments

Motivation

To make reduce DWIM when used with a +-slurpy operator on a list-of-lists.

That is, to make operations like the following work without bad surprises:

my \transpose = [Z] matrix;    # (\matrix and \transpose are each a list-of-lists)

my \sum-vector = [Z+] vectors  # (\vectors is a list-of-lists, \sum-vector is a list)

my \divisors-of-n = [X*] prime-factors-of-n.map(&powers-below-n-of);

In fact:

People keep assuming that this works.
It does work for most inputs.
The edge-case where it doesn't work is easy to miss when testing, thus posing a trap.

The workaround for making it work in all cases, is comparatively cumbersome:

my \result = list-of-lists.elems == 1 ?? ... !! [op] list-of-lists;
# (where the simplest way to write the `...` part depends on the op in question.)

(This all applies to both the subroutine form of reduce, and to the [ ] reduction meta-operator. In this RFC, reduce always refers to both.)

How it currently breaks

The edge-case of being passed a list-of-list with .elems == 1 breaks DWIM expectations:

my @v;
@v := (<100 200>, <10 20>, <1 2>,); say [Z+] @v;  # (111 222) -- All good.
@v := (<100 200>, <10 20>,);        say [Z+] @v;  # (110 220) -- All good.
@v := (<100 200>,);                 say [Z+] @v;  # (300)     -- WAT, expected (100 200)

Why it currently breaks

A double application of the single-argument rule:

Once for passing list-of-lists to reduce, removing the outer list.
Once more, if list-of-lists has exactly one child list, when reduce calls its operator with that child list as the only argument (and the operator has itself a +-slurpy signature).

Prior discussion

Change proposal

Make reduce check the operator it's given, and if it's a Callable with a single-argument slurpy, then in the one-element case where the operator needs to be called with a single argument, call it as op (element,).
Make sure that all built-ins like Z, X, zip, roundrobin which follow the single-argument-rule, actually report a + slurpy signature when introspected.¹

Why change 1 needn't be considered a dirty 'special case'

On a technical level, the proposed change no.1 would probably be implemented with additional if/else branching in the implementation of reduce, based on introspection of the operator that was passed to it — and adding special cases to built-in routines is usually undesirable.

However, this particular branching is not arbitrary and can be seen as part of a consistent rule.

In this view, a +-slurpy in a routine signature is simply a kind of "calling convention" that specifies how you need to pass arguments to the routine:

If there's a routine with signature foo (+@arrays) and you say to yourself "I want to pass the two arrays @a and @b to it", you'd write:
```
foo @a, @b;
```
If you say "I want to pass the single array @a to it", you'd write:
```
foo (@a,);
```
Even though on the syntax level you're now actually passing a new list that has @a as its element, it's perfectly reasonable to think of it as "passing the single argument @a", and the reason you're writing it with the extra comma is just because that's how this calling convention works in this case.

Now, when reduce calls the operator it was given, it too has specific reasons to want a certain number of arguments to arrive at the operator:

It normally calls the op with 2 arguments to say: "Here are two elements, please give me the result of combining them into one."
It calls the op with 1 argument to say: "Here is one(!) element, please give me the result of extrapolating your typical binary operation to this single-operand case in a consistent/useful manner. (Usually: The element itself.)"
It calls the op with 0 argument to say: "Please give me the result of extrapolating your typical binary operation to the zero-operand case in a consistent/useful manner. (Usually: The identity element.)"

So arguably, whatever calling convention the operator given to reduce uses, reduce should, in the single-element case, call the operator in a way that means "I want to pass this single element to you". For a +-slurpy operator, that happens to be op (element,);

Furthermore, reduce already introspects its operator in various ways (associativity, arity, etc.) in order to DWIM as much as possible, so this would probably fit right in.

Risks

Risk of breaking user code where people actually wanted the current behavior:
I'd say very low. An operation like "return the transpose of this matrix, except if it has exactly 1 row, then return a non-transposed copy of the matrix instead" is rather whimsical, and while it's not impossible to need that somewhere in an algorithm, the chance that someone needed it and realized that [Z] just so happens to give them that in current Rakudo, and decided to actually write it in that concise but obfuscating way... are hopefully close to zero. :P
Risk of breaking user code where people wanted the DWIM behavior proposed by this RFC, but already implemented their own workarounds to get it:
Hopefully low, because the most obvious current workaround (lol.elems == 1 ?? ... !! [op] lol) is immune to [op] lol changing its behavior for the one-element case.
Still, I wouldn't dismiss this risk as easily as the previous one - it's entirely possible that some workaround, somewhere, will break. Please comment if you can think of one or know one used out in the wild.
Risk of a language-design slippery slope:
If reduce learns to adapt to the special calling convention of +-signatured routines like zip, will people suddenly want lots of other built-in routines to adapt to the special calling conventions of lots of other routines?
I hope not. The special combination of circumstances that causes the problem for reduce, is:
- Two user-facing routines with a + slurpy;
- one calling the other;
- usually calling it with multiple elements from its own slurpy, but sometimes(!) only with one(!);
- and when calling it with only one, having a semantic reason to really want it to mean "one element".
Is there any other case like that (or similar) among the higher-order functions of the setting?
E.g. even though giving a +-signatured callback to map technically also causes a double-application of the single-argument rule, it's arguably not a problem here because:
- The callback is always called with the same number of arguments, regardless of the size of the input list, so users are not hit by surprises.
- map is "dumb" (i.e., low-level) – it doesn't have any understanding of why it must call its callback with a certain number of arguments, so it's okay for it to just mindlessly pass on whatever it's given and let the user be responsible for what happens (by choosing an appropriate callback signature).
But maybe that distinction isn't convincing to everyone...

Anti-risks

Risk of removing dormant bugs in user code (where people used [Z] and friends on lists-of-lists assuming they would reliably DWIM):
Quite probable. :P

^{1) Maybe they all already do, but I know that in the past some showed up as **@ or similar and did the single-argument-rule handling manually inside the routine, so an audit would be needed to make sure no such cases still remain in the setting.}

Answer 1 · 2024-10-04T15:46:41.000Z

Originally suggested by @smls in rakudo/rakudo#2025

Answer 2 · 2024-10-07T18:58:25.000Z

I have been bitten by this before, the current behavior always felt like a footgun to me