jashkenas/coffeescript

Independent splats in array pattern matching

TrevorBurnham opened this issue · 35 comments

Let's say that I want to get the first and last values of an array of arbitrary length, in a nice one-line statement. I could write

[first, middle..., last] = arr

But this is less than ideal from both a code efficiency standpoint and a readability standpoint, since I never use middle. What I'd prefer to write is simply

[first, ..., last] = arr

Do others agree that this should be allowed?

Not as nice, but you can write:

{0: first, (arr.length - 1): last} = arr

I don't see why not. Technically, it doesn't actually make sense to introduce a middle variable if it is never intended to be used. It hurts the readability of the code. I'd be in favor of this as long as nobody can spot any glaring issues.

You'd want a placeholder syntax that takes one space as well.

$ coffee -bpe '[first, _, third, _, fifth] = a'
var fifth, first, third, _;
first = a[0], _ = a[1], third = a[2], _ = a[3], fifth = a[4];

In SpiderMonkey, empty entries serve well.

js> [first, , third] = [1, 2, 3]
1,2,3
js> [first, third]
1,3

@satyr: good idea. I'd be in favor of the empty entries. The first syntax you mentioned would work fine right now if you didn't care about the value of the _ variable, but definitely shouldn't be introduced as a syntactic construct.

Of course though,

{0: first, 2: third, 4: fifth} = a

works just as well.

Ah this comes back to life. I need to dig up the old discussion we had on the topic.

EDIT: Start with #86, move on to #277.

EDIT 2: Thought I'd share http://githubissues.heroku.com/#jashkenas/coffee-script/ is great for searching.

Yes, I like weepy's proposal at issue 277 of using ? as a single-value placeholder. It looks very clear to me:

[first, ?, third, ?, fifth, others...] = arr

In fact, it'd be cool to see ? (or some other character) as a no-I-don't-need-this-value placeholder in other contexts in CoffeeScript as well; for instance, sometimes I just want the values of a hash, not the keys:

foo(val) for ?, val of hash

I understand the "named values contribute to more self-documenting code" argument Jeremy gave in issue 277, but I feel like that benefit is usually outweighed by the clarity that comes with avoiding the declaration of a variable you never use. For instance,

[first, ..., last] = arr

makes it much more clear that you only care about the first and last values of the array than any possible name for the middle value would. And, of course, there are the side benefits of brevity and slightly more efficient JavaScript output.

Would someone like to submit a patch?

What would arr = [first, ..., last] produce?

A syntax error. It's not necessary for the pattern-matching syntax to be completely parallel to the array creation syntax.

I don't think we need this for 1.0 ... Even if you're not going to use a variable that's serving as a placeholder, at least you can name it something descriptive. The only place that placeholder syntax would be used is in pattern matches... So, closing as a wontfix.

[first, ..., last] = arr

[first, ?, third, ?, fifth, others...] = arr

Coco now supports both of those without needing this proposal, in the form of

{0: first, (*-1): last} = arr
[first, [], third, [], fifth, ...others] = arr

satyr: want to link to your patch?

Could we reopen this issue? It was closed with the comment "I don't think we need this for 1.0"; now that 1.0 has successfully been released, I believe it merits further discussion.

Specifically, I'd like to propose allowing

[first, ..., last] = arr

and

[first, ?, third, ?, fifth] = arr

Here's a common use case for the latter: I run a regex, and I only want the group matches (or perhaps a subset of group matches), not the full match. For instance, let's say that I have a coordinates string in the format x,y, where x and y are integers. So I'd like to be able to write

[?, x, y] = coordinates.match /(\d+),(\d+)/

The closest I can come with the current syntax is to either 1) put in an unnecessary variable name instead of ?, or 2) write

[x, y] = coordinates.match(/(\d+),(\d+)/)[1..]

The ? syntax is, I think, both more readable and more writable, and would generate more efficient code.

Similarly, if I have a function call foo(bar) that returns a list of values, of which I only want the last three, I think

[..., x, y, z] = foo bar

is clearer than any existing syntax, and would generate more efficient code than any existing one-liner.

I am in support of the ... syntax, not so much the ?. Though I don't really have a better suggestion, so I'd be okay with it. With regards to your last example, though, I think [x, y, z] = (foo bar)[-3..] is pretty clean. Though I guess the new proposed syntax is more readable and more naturally understandable to people unfamiliar with coffee.

What don't you like about the single-value skipping syntax? Is it just that the ? feels like an arbitrary choice of symbol? I agree that it's arbitrary, but surely folks will get used to it? The only viable alternative I see is allowing [ , x, y], which is clearly less readable. I'd be happy to hear other suggestions, though.

The regex use case is a strong one. I frequently run matches just for the groups; adding [1..] to the end of the match, or sticking in an unused variable, feel like awfully kludgy ways of skipping the first array item. Plus, I sometimes do such regex matches in a performance-intensive loop, where superfluous assignments/slices potentially matter. And I may want to skip a group or two from the middle, not just the beginning.

@TrevorBurnham: It's not the syntax. I think a single value can always just be given a name. Skipping more than one value without naming it is more useful, though, because those skipped values may have nothing in common and thus no valid identifier. I like the hanging commas, but I remember jashkenas thought that it wasn't very readable, which I can pretty much agree with.

I suggest null as a placeholder symbol instead of ?. It's pretty intuitive for assignment to null to be a no-op. null... is also an option for a discarded splat. Upon compilation, a null in assigment position could simply become a local variable __null, which would be reusable within a function since it would never be read.

I'd suggest using some other symbol than ? at least--its semantic is consistent and syntax rules around it is quite complex as is.

Note that [] is half-working already:

$ coffee -bpe '[[], x, y] = match'
var x, y;
match[0], x = match[1], y = match[2];

Just remove the extra match[0] and we get the desired behavior.

@sethaurus: I like that idea a lot.

I'd be OK with the syntax

[null, second, third] = arr

Note, however, that the ideal compilation would be second = arr[1]; third = arr[2];. I don't see any reason for the __null variable, except that it may be easier to implement. (See @satyr's post above—he's halfway there in his implementation already.)

@TrevorBurnham: That's coffee. Go ahead and try it out.

Cool. So it's just a matter of getting rid of the extra symbol and using null instead of (or in addition to) []. I just find [null, second] = arr much more readable/writeable than [[], second] = arr.

Oh, and as to splats, is []... supposed to be working? It should be possible to implement non-assignment splats by doing a calculation on arr.length rather than making a slice call, e.g.

    [a, null..., b, c] = arr

would compile to something like

    __len = arr.length;
    a = arr[0], b = arr[__len - 2], c = arr[__len - 1];

It's more a case of allowing null and special-casing it so that it skips an index and doesn't get assigned. The [] syntax will probably always work as long as we have destructuring assignment. Though we can optimize it so that the reference to the value it's skipping is not output.

odf commented

I think I'd probably prefer using the [] over the null, since it stands out more. But assignment to null as a no-op makes sense as well.

Yeah, they both work. It kinda makes me uneasy because it does look like we are attempting to assign a value to null, which should cause an error, but once one understands that it's supposed to be a no-op, it seems alright. [] is perfectly okay, though. And it already fits with current semantics. I'd be okay with either one.

I still think ? is a bit clearer than null or []. As a newcomer to CoffeeScript, if I see

[null, second] = arr

then I'm wondering why there's an attempted assignment to null; and if I see

[[], second] = arr

then I'm wondering if there's some kind of fancy nested pattern-matching going on. Of course, ? is hardly self-explanatory, and perhaps it is overused...

Maybe void would be better than null, since it's an invalid keyword everywhere else in CoffeeScript:

[void, second] = arr

After all, it seems a little odd to allow [null, second] = arr but not [undefined, second] = arr or [false, second] = arr or [0, second] = arr; what makes null so special?

Consider my one vote for using a period instead of a ? or null. A splat ... symbol does a great job indicating a bunch of "unknown" things and a single period indicates one "unknown" thing. The period is visually very small which makes it somewhat like the use of nothing. So I recommend

[a, ., b, c] = array

We already use periods for this function so why not stay with the period for the particular case of one item.

The destructuring assignment syntax is already one of the more punctuation-heavy parts of the language ([, ,, ], =). If we use special characters to indicate a placeholder, we run the risk of making the whole construct visually confusing. I think a keyword is clearer, and I like TrevorBurnham's suggestion of using void.

void as a placeholder makes sense. I'll probably make it work.

I don't think that having a value-skipping syntax is a good idea, unless implemented consistently across the language ... ie, in function signatures, and arrays as well as destructuring assignment, and if it can be used in destructuring assignment, it should be use-able in regular assignment as well.

So, let's stick to the destructuring syntax we already have (as of 4ce374b):

[first, [], third] = list

Into:

var first, third;
first = list[0], third = list[2];

... an empty array or object will do. Personally, I'm going to continue to name the variables.

I have reverted jashkenas' commit 4ce374b above, as its maloptimization was the cause of issues #1103 and #1274. I am opening up this issue again so someone can make a proper optimization (or maybe even implement the [a, ..., b] = c or [null, a] = b syntaxes).

Thanks for the revert.

This all seems to be sorted out now on master.

list = [1..100]

[first, [], third] = list

console.log first, third

[first, []..., last] = list

console.log first, last

Produces...

1 3
1 100