eclipse-archived/ceylon

object destructuring

Opened this issue · 25 comments

Using the techniques from #4127 and #4406. It's now possible to implement destructuring for user-written classes as a syntax sugar over existing constructs in the language. (That is to say, with zero impact upon the backend.)

For example, we could make:

case (Person(name, age)) expr

desugar to:

case (is Person) let ([name, age] = p.destructured) expr

Given a new language module interface Struct shown here.

Or, alternatively, we could make:

case (Person { name, age }) expr

desugar to:

case (is Person) let (name = p.name, age = p.age) expr

Which is a solution that requires no Struct interface or any other kind of "extractor" function, but does not allow easy renaming. (You would have to write Person { fullName=name, age } or whatever if you were fussy about local aliases.)

I have two questions about this:

  1. Which of the above approaches do we prefer? With or without the Struct interface? I lean towards the solution without it.

  2. What does destructuring look like in syntactic locations other than switch? Is it, for example:

    for (Person {name, age} in people) { ... }
    

    Or would it just be:

    for ({name, age} in people) { ... }
    

Thoughts?

To be clear, what I'm proposing to introduce here is syntactically a new kind of pattern, but one that would be handled by the front end and never seen by the backend.

I'd go for "without" as well. Even though the Struct is definitely nice (and perhaps merits inclusion in the language module regardless) I think it limits things too much to classes that were specifically made with this use-case in mind and that would be a pity IMO.

I like the "without" option too.

That would still leave open the possibility of pattern matching records, right?

I think Person(name, age) should destructure using shared parameters. Person{name, age} should destructure using named shared properties like you suggested.

Regardless, I think this is a nice syntax sugar, but I think I ultimately dislike it.

An issue with the Person { name, age } syntax is:

  1. representing nested patterns, and
  2. representing constants in patterns.

For example, I guess you would wind up with something like this:

Person { name, address = { street, city, state, zip } }

Or:

Person { name, address = Address { street, city, state, zip } }

And:

Person { name, age = 18 }

Which, TBH, I don't find very natural. (It seems to me that = is the wrong symbol here.)

The Person(name, age) syntax, which would work with the Struct interface, or with @Zambonifofex's solution to consider shared parameters, does naturally handle these cases:

Person(name, Address(street, city, state, zip))

And:

Person(name, 18)

representing constants in patterns

Is that possible/good for Object destructuring/pattern matching, where the attribute may be mutable or computed?

For example, I guess you would wind up with something like this:

Person { name, address = { street, city, state, zip } }

Ah no, sorry, my bad. That was silly. You would write it like this:

Person { name, { street, city, state, zip } = address }

or like this:

Person { name, Address { street, city, state, zip } = address }

which is perfectly natural, and totally consistent with the syntax for aliases.

So that's fine. There's no problem there for nested patterns.

For literals in patterns I guess you would come up with something like:

Person { name, age == 18 }

which is also OK, I suppose.

So I guess I'm warming to this approach. Now, in traditional FP languages, "pattern matching" has been quite focussed on being able to write patterns like [0,0], [1, _], etc, i.e. patterns with literal values and wildcards in them, and at least arguably, the syntax Person { name, age == 18 } is less adapted to that problem than the syntax Person(name, 18) like what you see in Scala.

However:

  • I have personally never found these examples that compelling. It's always seemed to me that the beauty of pattern matching was the type switch combined with the destructuring.
  • As evidence for the above assertion, I note that languages with full pattern matching typically wind up having to add support for "guard" expressions because the literal-pattern matching isn't really expressive enough. A "guard" is, of course, just a weird if, and can be just as or more cleanly handled, IMO, with a nested if inside the case.
  • Branches of a switch in Ceylon have traditionally been checked for disjointness. That doesn't really work so well with full pattern matching. I like the disjointness checking, and find it useful. Now, sure, we could find some way to annotate non-disjoint cases, or whatever, but ultimately I'm not sure that's worth it.
  • Furthermore, one can argue that the syntax Person { name, age == 18 } actually enables more powerful and expressive patterns, since one could write Person { name, age >= 18 } or Person { name, address exists }, or even Person { name, exists { street, city, state } = address }, etc, etc.
  • Furthermore, one can argue that it's easier to read the pattern, because you see the name of the attribute that is being asserted. Oh, and you can write the attributes in any order. Oh, and you don't need wildcards.
  • Finally, it turns out that this is actually very close to the syntax used for pattern matching record types in ML, so there's actually very solid precedent for it.

So it seems to me that supporting just destructuring, for now, with the syntax:

  • Person { name, age, {street, city, state} = address } in switch, combining a type assertion with destructuring, and
  • { name, age, {street, city, state} = address } in other syntactic locations,

would be a really reasonable extension to the language, and leave us with plenty of room to grow in terms of supporting "full" pattern matching if that's a direction we ultimately decide we want to take. (Which is still not clear to me at all.)

Oh and I suppose I would also support aliases using the syntax:

Person { personName=name, address=mailingAddress }

I think we probably shouldn't allow wildcarded imports like Person { personName=name, ... }.

(FTR: yeah, I'm modeling this after our import syntax.)

I think Person(name, age) should destructure using shared parameters.

@Zambonifofex FTR, I'm not very keen on the idea of tying any of this stuff the signature of the class initializer because:

  • For classes with constructors, there are no shared parameters.
  • Some initializer parameters are un-shared. Such parameters typically initialize additional shared state, which would not be accessible to the pattern.
  • Interfaces don't have initializers. And some classes are sealed.
  • A pattern can also match subclasses, which have a totally different signature for construction.

These considerations point me toward thinking that, as nice as it seems in simple cases, for example, Point(x,y), it's not really a very robust way to handle this. Point{x,y} is more powerful, it seems to me.

(FTR: yeah, I'm modeling this after our import syntax.)

Hrm. Except that's not quite true. Perhaps the syntax should be:

  • Person { name, age, address {street, city, state} }, and
  • { name, age, address {street, city, state} } instead?

Hrrmph.

So I think it makes sense for object destructuring and imports to look the same. They do very similar things (an import statement destructures a package).

And Person { name, age, address {street, city, state} } looks more like a "pattern", and is less verbose than the version with an =.

The only potential objection is that if/when we decide we need conditions embedded in the pattern, then we wind up with stuff that's perfectly reasonable like:

  • Person { name == "Gavin", age>18 }
  • Person { name, exists address, is Employment activity }

But we also wind up with the most natural way to combine a condition with nested destructuring being:

Person { name, 
         exists address { street, city, state }, 
         is Employment activity { employer, since } }`

which looks perfectly reasonable but isn't the same as the existing syntax for conditions:

Person { name, 
         exists { street, city, state } = address, 
         is Employment { employer, since } = activity }

Well, I suppose that's OK. We would just have to pick one of these two options if and when the time comes.

I worked on this today, and came to the conclusion that it can't reasonably be implemented using desugaring. So it will have to wait until @tombentley has time to work on it.

But we also wind up with the most natural way to combine a condition with nested destructuring being:

Person { name, 
         exists address { street, city, state }, 
         is Employment activity { employer, since } }

which looks perfectly reasonable but isn't the same as the existing syntax for conditions:

Person { name, 
         exists { street, city, state } = address, 
         is Employment { employer, since } = activity }

@gavinking - Agree the first one feels very natural.

I think the following syntax for combining all of destructuring, conditions and aliasing would feel as natural, without being inconsistent with existing syntaxes:

Person { name, 
         exists addressAlias = address { street, city, state }, 
         is Employment activityAlias = activity { employer, since } }

My head naturally parses the { ... } part of address { street, city, state } as simply a destructing operator/operation on address, more than as a local declaration of street, city and state. It just happens that the destructuring has the effect of creating the locals anyway.

So that it really doesn't feel in conflict with the syntax for conditions:

  • fits the syntax for conditions: (exists localName = expression)
  • also fits the syntax for imports: import some.package { TypeAlias = Type }
  • without the aliases, drops back to exactly what you had in the first one.

Without aliases:

Person { name, 
         exists address { street, city, state }, 
         is Employment activity { employer, since } }

Without conditions (similar to imports):

Person { name, 
         addressAlias = address { street, city, state }, 
         activityAlias = activity { employer, since } }

Without aliases or conditions:

Person { name, 
         address { street, city, state }, 
         activity { employer, since } }

Honestly, I’ve been growing to dislike this feature more and more. What are we trying to achieve again? Wasn’t one of the goals of Ceylon to be easily recognizable? I think this is a type of complexity the language doesn’t need.

Pauan commented

@Zambonifofex I think the point is to make certain common use cases more concise. In particular, it tends to benefit ADTs.

For classes which aren't being used as ADTs, it doesn't provide that much benefit, in my opinion.

As for "recognizability", a lot of languages have some form of object destructuring, including JavaScript. I think it's a very natural complement to the Tuple destructuring in Ceylon.

If you guys do really want to support this, consider the syntax I suggested:

value  foo  =  "hello";
value [foo] = ["hello"];
value [foo, bar] = ["hello", "world"];
value  foo->bar  =  "hello"->"world";
// Destructuring based on shared parameters
// Alternatively, to support (ugh) named constructors: destructuring based on shared attributes whose names are the same as the parameters from the constructor
value Person(first, last) = Person("Gavin", "King");
// Destructuring based on shared attributes
value Person{surname=last; name=first;} = Person{surname="King"; name="Gavin";};
value Person{string=full;} = Person("Gavin", "King");

Note how on all cases both sides are balanced.

  • Where you have [ on the lhs you have [ on the rhs.
  • Where you have -> on the lhs you have -> on the rhs.
  • Where you have Person( on the lhs you have Person( on the rhs.
  • Where you have name= on the lhs you have name= on the rhs.
  • Etc.

The syntax for nested destructuring would be based on a similar concept:

value Person(first, last, Email(email)) = Person("Gavin", "King", Email("gavin@example.net"));
value Person{name=first; surname=last; email=Email{name=name; host=host;};} = Person("Gavin", "King", Email("gavin@example.net")); // declares `first`, `last`, `name` and `host`.

To declare both email and host, one could do something like this:

value Person{email=email;} = Person("Gavin", "King", Email("gavin@example.net")); // declares `email`
value Email{host=host;} = email; // declares `host`

My syntax may not be the cleanest, but I think it’s the most regular, and the one that will benefit the language most in the long‐run.

Today I though a bit further on this topic. In particular, I spent some time thinking about what would be the design of we went down the path that other languages use of mapping a pattern like Entry(key, item) to a constructor of the class.

In particular, my opinion has firmed against the idea of using an extractor function; the problem with an extractor function is:

  • it has to package the field values in something like a Tuple, which means instantiation, which has a performance impact, and
  • since it can do arbitrary logic, that can't be optimized away by the compiler.

Now, sure, we could make the typechecker do something like what it does with annotation constructors, and limit the sort of logic one can do in an extractor function, but then: what's the point of it being a function?

So, given that, how would we identify the "signature" of an arbitrary Ceylon class?

It seems to me that the obvious solution would be to use an annotation. So, we could write:

structural(`key`, `item`)
class Entry<Item>(String key, shared Item item) {}

And, in the case that the parameters of the class initializer are all shared, you can abbreviate to this:

structural
class Entry<Item>(shared String key, shared Item item) {}

Of course it would be an error to leave off the field list if not every parameter is shared. That way we can't accidentally break clients by changing the parameter list.

Now a pattern would look like, for example:

case (Entry(name, item))

A nested pattern would look like:

case (Entry(name, Person(firstName, lastName, ...))

This doesn't seem like an unreasonable solution to me.

So now we have two competing proposals:

  1. the approach based on the syntax of import statements:

     Entry { name=key, person { firstName, lastName, ... } }
    
  2. the approach based in the structural annotation:

     Entry(name, Person(firstName,lastName, ...))
    

Pros/cons:

  • The huge advantage of the import-like approach is that it doesn't require any special support on the declaration of the destructured type, since it doesn't depend on assigning any ordering to the shared attributes. You can destructure any type, even Java classes!
  • The big advantage of the structural annotation is that you don't need any special syntax for assigning aliases: you can freely use whatever labels you like.
  • A second major advantage is that patterns like Entry(name, item) feel to me much more similar to the tuple and entry patterns we already have, e.g. name->item and Entry(name, item) both look like Entry instantiation, and both allow free assignment of labels.

Tentative conclusion: if I could think of a way to reasonably support destructuring for Java objects, then I would happily go with structural. If not, I guess I lean toward the solution which does allow that.

P.S. I'm not sure exactly when we require you to write in the type. But I think that's orthogonal to the question of Entry {...} vs Entry(...). In principle, both approaches could potentially allow me to write destructure statements without the type declarations:

value (name, (firstName, lastName, ...)) = entry;

or:

value {name = key, {firstName, lastName, ...}} = entry;

if I could think of a way to reasonably support destructuring for Java objects, then I would happily go with structural.

Eureka. I think I've got it.

So here's a way to obtain the best of both proposals.

  • if the destructured class is annotated structural, then you can freely assign whatever labels you prefer, but
  • on the other hand, if it isn't, then you have to use the name of the attribute you're labelling (you can't assign an alias).

An open question is: in the case where the class isn't annotated structural, do you still need ...?

  • it has to package the field values in something like a Tuple, which means instantiation, which has a performance impact, and
  • since it can do arbitrary logic, that can't be optimized away by the compiler.

FYI, links to a presentation on how destructuring in Java will handle these items. In short, a future version of Java will offer a language neutral facility for destructuring using method handles (similar to the existing facility for lambdas) to help reduce simple cases to method handles for a match function and n field accessors. Complex cases will also involve a precompute function that generates a generic "carrier" object (a single allocation) to remember results from expensive transformations. The carrier object is made available to the match function and the functions to obtain the n values.

The discussion between "class annotation" approach and "import" approach can be seen from a different perspective: Who should be definning the destructuring pattern? The owner of the class (i.e. adding the annotation) or the client (i.e. when importing).

IMHO, the client is the one actually giving a meaning to the destructuring pattern (something completelly local), while the owner is just proposing a "default" pattern.

I am thinking about the situation where I (as a client) want to switch just on the key, no matter what item is?
If the destructuring pattern is defined by the class, then I am forced to add wildcards to the cases (something like case (Entry("Jordi",_)) { ... }. But if I am allowed to write my own destructuring pattern, I can just use the key attribute, ignoring the itemand avoiding the wildcards.
It can get even more usefull on more complex cases.

So I'm proposing something like a failback: If there is a destructuring pattern locally defined (in import), that's the one to use. Else, if class have it own pattern, use that one. Else, use a "default" pattern (i.e. all shared attibutes, labeled as the attribute name), if any.

Now I think about that, the "import" approach can also take some proffit from local imports, creating patterns restricted to the same scope than the import, or even overwritting the default pattern just in that scope. That's a +1 for the "import" approach not to be ignored.

@jvasileff sure, but it's going to be literally years before we can take advantage of anything like that. And honestly it seems to me that there simply isn't a strong need for custom extractor functions in the first place.

I'm with @jvasileff that we should seriously study what they have come up with to compare. Also, they JDK moved to a 6-month release schedule after Java 9, so they are indeed aiming for Java 10 in 1st half of 2018. Whether pattern matching will be part of it remains to be seen, but there's a chance.