tc39/proposal-optional-chaining

Defining precisely the scope of short-circuiting

claudepache opened this issue · 24 comments

From #3 (comment):

The explainer talks about the "right hand side", but where does the right hand side end exactly?

The rule is more precisely:

If the part at the left of ?. evaluates to null/undefined, the rest of the current subexpression is not evaluated, and the whole subexpression evaluates immediately to undefined,

where the meaning of “the current subexpression” should be clearly defined, which is the goal of this Issue. “The current subexpression” may be:

  1. “the current property access, method or function call”, e.g. a?.b — which is less useful if it is immediately followed by another property access, method or function call as discussed elsewhere;
  2. The minimum I want is: “the current chain of property accesses, method and function calls”, e.g., a?.b.c().d[x];
  3. For the sake of uniformity of treatment, I think one should include what the spec calls “Left-Hand-Side Expression”, which includes additionally new and tagged templates (which are sort of method/function invocations), e.g. new a?b.c().d[x] `{y}` ;
  4. Additionally, one may want that parentheses used for grouping should not stop short-circuit, e.g., (new a?b.c).d[x] `{y}` .

I chose option 4 ; although, as a user, I’ll be quite indifferent to anything between 2 and 4.

Some additional remarks:

  • If the inclusion of new and/or tagged template is judged confusing, one could statically forbid (i.e., “syntax error”) construction of “left-hand-side expressions” that contains new or tagged template somewhere after a ?.. I expect that the need of such a construction should be rare in practice anyway.

  • About grouping parentheses as in (a?.b).c. One must take a decision, but any option will have very limited practical impact, because:

    • probably nobody would write such an expression while expecting short-circuiting be stopped by parentheses, as it would be useless: the net result would be identical to (a.b).c;
    • the only people writing such an expression while expecting parentheses be transparent to short-circuiting are probably those participating to a competition of code obfuscation, as it would be equivalent to a?.b.cexcept in some constructs incorporating the new operator, such as (new a?.b).c... that should be rare (and could be statically forbidden, see previous bullet).

Seems like these are in order from what I'd prefer the most to the least. I would expect at most 2, which provides a simple syntactic rule.

While I struggle to think of cases where (3) would be useful, I am curious if there is something that seems semantically harmful about supporting it?

In case it aids the discussion, here are examples of bad code I could imagine someone trying to write:

const fancyAlerting = (analyticsApi, loggedInUser = null) => {
  const fancyUserProfile = new loggedInUser?.Profile();
  analyticsApi.track(fancyUserProfile, 'someAction');
  // where `.track()` gracefully handles nullish first args
};

const fancyErrorPopup = (fancyError = null) => {
  const errMsg = fancyError?.format`
    You pushed the button wrong!
  `;
  maybeAlert(errMsg);
  // where `maybeAlert` gracefully handles nullish first args
};

There are obviously other, probably better ways of expressing these concepts; I don't mean to put this forward as something that I personally want to do.

I think it's better if we don't add obscure and unintuitive instances of syntactic sugar.

While that would certainly be obscure, I'm not sure it's unintuitive.

For example, I would find it quite unintuitive that a?.b.c('hello') works while a?.b.c`hello` does not, given that they are (by my limited understanding of tagged template expressions) semantically equivalent.

It's been my rough impression that in the past TC39 has erred on the side of consistency, even in obscure cases like these. Has the committee (or perhaps, some members of it such as yourself) come to regret those decisions? Or perhaps I have the wrong impression.

Separately, and I hope this does not come across the wrong way – given your comments in other threads, I am curious if you have used ?. in other languages?

Consistency is one thing, and then defining obscure edge cases to have a particular obscure behavior is sometimes another one. I can definitely say I disagree with some particular past decisions on certain edge cases (as I've proposed changes to them). In my opinion, TC39's job should be to work out the details for the simplest consistent feature design that solves programmers' problems.

No, I haven't used ?. in other languages; maybe that leads to naivite here on my part. Sorry for that; I'll go do my homework.

I would be surprised if either option worked - iow, having ?. extend to invocations does not seem intuitive to me, whether () or tagged template.

@ljharb yes, you've made that clear elsewhere 😄

I'm curious what environments you've used ?. in, if any?

I've searched far and wide for signs of confusion in the coffeescript community about that behavior (stackoverflow, github issues, etc). CoffeeScript users show plenty of signs of confusion/surprise, but I've come up dry on this topic.

Have you used a language with this feature and found the behavior surprising?

(Sidenote: if anyone is looking to quickly+easily experiment with ?., you can pretty much copy/paste JS into the editor on http://lightscript.org and use this feature there.)

I have not used this feature in other languages, including CoffeeScript, but that perspective is still potentially more valuable than that of somebody who has used it. Lots of things are idiomatic in other languages, that are not idiomatic in JavaScript - it's important not to forget that.

I certainly agree that the perspective of a newcomer to the feature is very valuable.

It so happens that my personal hunch would be that if you showed a novice JS person the scrap obj?.prop, they would expect the feature to behave as in option 3 or 4. Indeed, it's also my hypothesis that even if a language expert such as yourself tried the feature for a little while, you'd also expect it to work at least as in option 2. But that's just my guess; actual user-testing would be required. And of course as you say, it's the novice's perspective we seek.

Lots of things are idiomatic in other languages, that are not idiomatic in JavaScript - it's important not to forget that.

I'm curious what you have in mind?

The only relevant JS idiom I can think of would be if (foo && foo.bar) { foo.bar.baz() }, to which the coffeescript idiom of foo?.bar?.baz() is relevant. But it sounded like you might have some examples you were thinking of. If it's just the general point, I certainly agree!

If this is the wrong place for these conversations, feel free to react with a 😄 rather than a comment – I don't mean to badger if this isn't helpful.

Nothing specific.

However, a?.b having effects that extend beyond b is very "spooky action at a distance", and it makes me very uncomfortable.

OK, this isn't evidence of me using the feature, but from the C# standard, it looks like C# implements option 2/3 from the above, rather than 4. I don't know enough about C# to know whether it has any more call/property-access-like operations; I don't see any.

I apologize for bringing up short-circuiting in this thread. Let's stick to #3 for that.

For me, I feel more comfortable with the short-circuiting concept if it can be explained syntactically, as C# does, rather than by the propagation of a Nil value. Although both of these are spec devices, it seems like they leak out into the mental model that programmers will use when understanding the scope.

I have just updated my spec text, which now implements Option 3 which still implements Option 4 (although I am not strongly inclined to one way or to the other re the parentheses issue).

I have no strong opinion on whether short-circuiting should allow to bypass new and/or tagged templates; but in case it is decided that it should not, I think that there should be a static semantics that forbid those cases, as they would be completely useless and potentially confusing.

FWIW, I've just reverted my spec text to Option 4, as it was just as simple to spec, contrarily to what I thought (I feared that I had to add complication to static semantics rules, but finally: no).

My preference would be for something in between options 1 and 2 (I'll call it option 1.5): special-case the syntax a?.b(), but don't do any further short-circuiting. As mentioned in #17, this covers 93% of real-world usages of ?. in CoffeeScript code, and IMO makes the feature less surprising and makes the language much more predictable and easier to reason about, since arguably the a?.b() case isn't short-circuiting at all.

I think special-casing the a?.b() syntax makes sense because people already think of it as "one unit" in JS (it's a method call, not a property access followed by a function call), and it's already the case that const f = a.b; f(); behaves differently from a.b();. To my knowledge, that isn't true anywhere else in the language (except assignments, I suppose). Outside of assignments and method calls, I believe JS has the basic principle that an expression evaluates to a value (maybe with side-effects) and that value is all that's needed by the parent expression to continue evaluation. Short-circuiting would add a new significant case where that assumption isn't true.

Option 1.5 means you need to use ?. explicitly to propagate a null check, rather than having the language do it implicitly. IMO, having to do a?.b()?.c?.d isn't much worse than a?.b().c.d, and is arguably more clear. (I understand it swallows errors where b or c are unexpectedly not there, but my impression is that that issue is relatively minor.) As mentioned in #17, explicit chaining is more common in CoffeeScript than implicit chaining (although there are plenty of possible caveats to that analysis).

Another thing to keep in mind is that any short-circuiting makes tools harder to write. Operations like "extract method" and "extract variable" suddenly need to be very careful that they're not changing the behavior. With tools like jscodeshift and AST Explorer, it's becoming more common for regular developers (not just "tools people") to write code transformations, and I worry that adding more complexity to the evaluation rules would make it harder for people to safely write things like automated refactors.

It's a fair point that const f = a.b; f() is different than a.b() - but const o = a.b; o.c() is the same as a.b.c(), and I think it's important to retain that.

@ljharb Right, to be clear, my proposed option 1.5 behavior is that optional chaining only works on the forms a?.b, a?.b(), a?.[b], a?.[b](), and a?.(), and never extends further out than that. I think that matches what you want.

Examples:

a?.b?.c()  // `a?.b` evaluates to a value, let's say `x`, then `x?.c()` is evaluated.
           // Does not crash if `a` null.
           // Does not crash if `a.b` is null.
           // Crashes if `a.b` is non-null and `a.b.c` is null.

a.b?.c()  // Crashes if `a` is null.
          // Does not crash if `a.b` is null.
          // Crashes if `a.b` is non-null and `a.b.c` is null.

a?.b.c()  // Crashes if `a` is null. This is buggy code, just like any code
          // that uses a possibly-null value directly without a null check.
          // Static analysis tools should complain about this code.

@alangpierce it sounds like short-circuiting extends beyond that roughly 7% of the time. Is there an easy way to see a few examples of that and see how reasonable they are?

Sure, I hacked my tool to print out all examples and ran it on the Atom codebase. Full gist here: https://gist.github.com/alangpierce/08611d4d1cf5572fc5a272d7468b96d6

Here are all 364 soak operations in the Atom codebase:

https://gist.githubusercontent.com/alangpierce/08611d4d1cf5572fc5a272d7468b96d6/raw/f8cf588b983e4d2a7ed45e953367a10abbbde379/all-soak-examples.txt

Here are the 159 (44%) soak operations that use short circuiting:

https://gist.githubusercontent.com/alangpierce/08611d4d1cf5572fc5a272d7468b96d6/raw/f8cf588b983e4d2a7ed45e953367a10abbbde379/short-circuit-soak-examples.txt

Here are the 17 (5%) soak operations that use short circuiting that isn't just a method invocation (i.e. cases that wouldn't be handled by my option 1.5):

https://gist.githubusercontent.com/alangpierce/08611d4d1cf5572fc5a272d7468b96d6/raw/f8cf588b983e4d2a7ed45e953367a10abbbde379/short-circuit-soak-examples-ignoring-methods.txt

(Each file is a subset of the previous file.)

Thanks very much for sharing, @alangpierce ! Seems like an instructive set of examples, especially those towards the end of the last gist.

Seems there are many cases of "if a is not null, then we expect it to always be of some shape, and use that shape directly" – whether that is a string that will match a certain regex, or a nested object with methods.

Some examples that seem representative to me, in no particular order:

while selection.end.row is selections[0]?.start.row
@nextSibling?.model.setFlexScale(1)
args?[3].match /Desktop/i
line.match(indentRegex)?[0].length

@rattrayalex Yep, agreed that "optional object with a known structure" is a real use case, and that it's solved well by short-circuiting.

I think it's a matter of tradeoffs; I would argue that any kind of short circuiting behavior adds complexity to the language, and that the right way to frame the discussion is to weigh the cost of that complexity with the benefit it provides.

My gut feeling on the cost is that it's unsettling to change the language in this way, but I also wouldn't be super-opposed to it. My gut feeling on the benefit is that it's sometimes useful, but doesn't really come up that much, and usually the alternative isn't so much worse. It certainly doesn't feel like short-circuiting is critical for ?. to be useful. But hopefully the examples and stats help others make their own decision.

With your specific examples, I think it's also worth noting that the first three of your examples can be written using && for only a minor inconvenience (some duplication, but no need to pull out variables or anything like that). But here's another example from Atom that uses short-circuiting and would be especially annoying to rewrite using &&:

atom.workspace?.getActivePane()?.getItems().length

For that case, my best idea in the "option 1.5 world" is to just do ?.length at the end.

@alangpierce (#10 (comment))

IMO, having to do a?.b()?.c?.d isn't much worse than a?.b().c.d, and is arguably more clear.

IMO, it is is arguably more brittle and less clear:

  • More brittle: If, by accident, a is not null, and a.b() is null, then, when evaluating the expression a?.b().c.d, an error is thrown. When evaluating a?.b()?.c?.d, the bug is silently forgiven and becomes harder to track.

  • Less clear: When someone reads a?.b()?.c?.d, they wonder (in case a is not null) whether you expect that a.b() may be null. When they read a?.b().c.d, it is clear that you expect a.b() be never null. — And it is also clear that, if a is null, short-circuiting must apply to the whole chain, because stopping earlier makes no sense.

Those are the two reasons I want short-circuit (at least Option 2 in order to encompass the most plausible cases), as I've already expressed in #3 (comment): debuggability and expressivity.

The ongoing refactoring of Issue #20 :

  • will implement Option 2;
  • will make Option 3 void, because new operators and tagged templates won’t be allowed by the grammar after an optional chain (unless you have a strong use case for new, see #22);
  • will make parentheses stop short-circuiting. This is a natural consequence of expressing short-circuiting the same way as || or && instead of the propagation of a reference.

I’m closing this. If you have still problems about short-circuiting, first read the FAQ, item “Why do you want long short-circuiting?”, then comment in Issue #3 or other appropriate place.