keean/zenscript

Unions and Intersections

Closed this issue · 81 comments

keean commented

@shelby3 I want to discuss whether to include union types here. I would like to design a language that meets the requirements you have for extensibility, but I don't necessarily want to achieve it with unions. I am not fundamentally opposed to unions, and if they are the best way to achieve what we want then we have them, but in the interests of keeping the language as small as possible, I want to make sure we are adding something of value, that cannot be achieved in other ways. Can you provide some example use cases for union types so we can explore the possibilities?

More comments to follow...

@shelby3 wrote:

And this is why we need anonymous structural union types, e.g.:

if (x) true
else 3

The inferred type is Boolean | Number.

keean commented

I realise you can write "if x true else 3", but why would you do that? To me this is not a motivating example, it needs to be something a bit bigger, and algorithm or design pattern that benefits in readability or simplicity from having the feature. What do we lose by insisting both sides of the 'if' are the same type?

@keean wrote:

I realise you can write "if x true else 3", but why would you do that?

Resolving to an instance of a heterogeneous type.

mycollection.append(if (something) new Type1() else new Type2())
keean commented

What about inferring the type "x where x is in all common traits"? If you allow traits as "return" types then this is not a problem.

@keean wrote:

What about inferring the type "x where x is in all common traits"?

I presume you mean subsuming to the greatest lower bound (GLB) when the heterogeneous types involved are traits (which btw may not exist since we can't subsume to any top type without instigating unsafe code, so it would often be compiler error). That won't apply when the heterogeneous types are data types since we can compute the anonymous structural type of the disjoint union, because otherwise data types could not be subsumed without subclassing. So without subclassing, we must have tagged disjoint unions, else we would lose the solution to Wadler's Expression Problem.

If you allow traits as "return" types then this is not a problem.

I am thinking it is a non-unified language with many corner case for us to attempt to only allow trait types (bounds) on function arguments. We must support trait objects every where a type can be used, except they (trait bounds and trait objects) make no sense in unions because traits by definition aren't tagged (reified), but unions are okay for data types will (must) be tagged in the run-time.

Thinking about this more, traits are not types of instances. They are either monomorphic bounds on a type parameter of a function/method, or they are the polymorphic interface of trait objects with the dictionary joined to the instance just like subclasses (but with a hasA instead of isA subclassing semantics). Trait objects are polymorphic, and thus stall the CPU pipeline.

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

I remember we discussed all of this before in May, but I may not have been as clarifying and succinct as I am attempting to communicate now.

Edit: Trait objects enable polymorphism within a homogeneous type parametrized collection, so that an invariant collection does not need to be used and the type parameter can remain invariant.

So it appears we need both. The two ways to achieve polymorphism offer different trade-offs.

keean commented

@shelby3 wrote:

I presume you mean subsuming to the greatest lower bound (GLB) when the heterogeneous types involved are traits (which btw may not exist since we can't subsume to any top type without instigating unsafe code, so it would often be compiler error). That won't apply when the heterogeneous types are data types since we can compute the anonymous structural type of the disjoint union, because otherwise data types could not be subsumed without subclassing. So without subclassing, we must have tagged disjoint unions, else we would lose the solution to Wadler's Expression Problem.

No, you simply keep the intersection of the traits implemented by both sides, the type remains polymorphic (unknown).

@keean wrote:

No, you simply keep the intersection of the traits implemented by both sides, the type remains polymorphic (unknown).

Which is the same as what I wrote. So why did you write "no"? You described the GLB.

Yes we can do that when the GLB is not the top type. But it is not entirely sufficient to only offer that capability and not also unions of data types.

You seem to be not quite grasping (or articulating that you've grasped) two of my points:

  1. The GLB might the top type. In a safe language, the top type must be the empty set.
  2. If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.
keean commented

@shelby3 wrote:

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

In a way they both suffer from a form of the expression problem. With tagged-unions, we have type-case statements scattered throughout the code, and If we add a new type to the tagged-union all the type cases need to have an extra option added.

With the "trait-object", if we want to add a new method to the collection you have to add a new trait to the collection type.

keean commented

You seem to be not quite grasping two of my points:

The GLB might the top type. In a safe language, the top type must be the empty set.
If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.

If there are no traits in common between two types, then there are no functions you can call on the object. Consider two objects A and B that have no methods in common. If we have a function that returns "A|B" what methods can we call on the value with type "A|B"... none at all.

If you want to do a type-case (runtime matching) you should be using a tagged union. The point of type-classes is to derive the statically safe set of functions/methods that can be called on some object.

@keean wrote:

In a way they both suffer from a form of the expression problem. With tagged-unions, we have type-case statements scattered throughout the code, and If we add a new type to the tagged-union all the type cases need to have an extra option added.

Disagree. This is the crux of my proposed complete solution to the Expression Problem which I explained to you before in May. Let me try again to explain it.

When calling a function with a trait bound and passing a union of data types as the input, the compiler can automatically create the type-case boilerplate and wrap it as if it is a single implementation of that trait bound and pass that to function simulating monomorphism. Within said implementation, it will be doing the type-case logic, so it does stall the CPU more than monomorphim, but it is seamless at the source code and emitted function implementation level; and it is more optimizeable than trait objects which erase their knowledge of the data types and thus must use agnostic (blinded) jump tables.

It is the outcome of using only a trait bound to operate on the tagged union, which makes it fully automated boilerplate. That was my insight.

Of course that doesn't prevent the programmer from doing type-class guards to get at the precise data types for some other reason, but that wouldn't be an extensible design pattern.

@keean wrote:

You seem to be not quite grasping (or articulating that you've grasped) two of my points:

1‌‌​. The GLB might the top type. In a safe language, the top type must be the empty set.

If there are no traits in common between two types, then there are no functions you can call on the object. Consider two objects A and B that have no methods in common. If we have a function that returns A|B what methods can we call on the value with type A|B... none at all.

So we are in agreement.

Except that typically inferred subsumption to the top type is a compiler error (because it probably represents something the programming didn't intend, since it is more or less useless). The programmer must declare the top type, if that is what they want.

Btw, if you use backticks ` in markdown instead of quotes then your programming terms will be rendered correctly in a monospace font as shown above.

2​. If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.

If you want to do a type-case (runtime matching) you should be using a tagged union. The point of type-classes if to derive the statically safe set of functions/methods that can be called on some object.

Seems you didn't quite understand what I was trying to explain to you this past May (granted my elucidation was probably not coherent enough then). See my prior comment for another attempt to explain how to avoid type-case boilerplate.

Tangentially, I am doubting (not understanding) the claim that was made for Rust that monomorphic trait bounds do not stall the CPU pipeline. The function can't know which implementation it will be using until run-time, unless we generate a new version of the function for each implementation of a data type. Thus the function will be calling the methods using an array of pointers to the methods. So this is a jump table. We can't inline the methods into the function unless we generate a new version of the function for each implementation of a data type. So for a tagged union type, the same array of method pointers can be used, but of course within those wrapper methods will be run-time type-case branching which further stall the CPU pipeline, but branches are more optimizeable by the CPU than arbitrary jumps that would be the case for trait objects.

keean commented

@shelby3 I think you are not fully understanding the difference between dynamic (runtime) and static (compile time) polymorphism, and this is leading to confusion between us when we discuss this. Lets start with static polymorpism:

id<A>(x: A) : A {return x;}
id(4)

In this program id has a polymorphic type, that means its implementation is the same no matter what type it is passed. In this case only a single implementation of 'id' is required for all types.

class Show B {
    show(x : B)
}

instance Show for Int {
    show(x) = print_int(x)
}

instance Show for String {
    show(x) = print_string(x)
}

show(4)
show("ABC")

Here we see specialisation, there are different overloaded implementations for 'show' that are chosen by type. Each implementation deals with exactly one type.

show_twice<A : Show>(x : A):
    show(x)
    show(x)

Using type-classes we can write generic functions like 'show_twice' that work for any type by looking the specialised version of the function calls it uses up in the dictionaries for the types passed, but is a single generic implementation (the type class dictionary for 'Show A' is an implicit parameter').

However it is important to note that the type parameter is invariant, that means we must know the concrete type at compile time (this is the difference between parametric types and universally-quantified types, with universally-quantified types there are some situations like polymorphic-recursion where we might not know the type until runtime, which is why Haskell does this by dictionary passing). So with parametric types 'A' must be a ground type in order to compile the function, which means we can monomorphise and output a specific version of 'show_twice' for the type of the argument passed. So this is what happens:

show_twice("ABC")

compiles to:

show_twice___string(x : String):
    print_string(x)
    print_string(x)

show_twice___string("ABC")

combined with inlining this will produce:

print_string("ABC")
print_string("ABC")

So you can see there is no use for 'unions' as we must provide a concrete/ground type as the type parameter for a parametric function, and we can do this if the data is static.

When calling a function with a trait bound and passing a union of data types as the input, the compiler can automatically create the type-case boilerplate and wrap it as if it is a single implementation of that trait bound and pass that to function simulating monomorphism.

This is where I think you have a misunderstanding, if we can statically determine the type, then we can statically determine the exact type, no unions necessary. For example:

x : bool = true
y = if x then "hello" else 32
show_twice(y)

Here we know the type of 'y' is a string because 'x' is true, as such what is passed to show_twice is always 'string', it is never a 'string or int' because such a thing cannot exist y is always precisely a string or an int, never both at the same time. So for static polymorphism we never need union types.

Note: this kind of thing is not normally allowed because the type of 'y' depends on the value of ''x' being true or false. Such things are called dependent types, and languages like Rust don't allow this, instead they only allow types to depend on other types.

keean commented

A follow on from above, although we cannot support monomorphisation easily in the final case above, we can use path dependent types to do something similar. If we require the return type from 'if' to be the same on both sides, or perhaps more easily have 'if' as a statement not an expression so there is no 'return' type we can simply do:

if x then:
    show_twice("hello")
else:
    show_twice(32)

Here the type of 'show_twice' depends on the path taken at runtime, but we know the exact type passed to each instantiation of the polymorphic functions 'show_twice'. This is a compelling reason to not allow 'if' in expressions.

I would argue the above code is simpler and easier to read the the 'if' expression version, with the small cost of having to repeat 'show_twice'...

keean commented

So are there any circumstances where we cannot statically determine the type of something, the answer is if the type of something depends on the result of program IO.

For example if we have a polymorphic collection that we insert values of different types into depending on what key a user presses, we can no longer statically determine what the type of any given element selected from the collection will be.

This changes things significantly, so lets discuss the static case first and then move onto the dynamically polymorphic case.

@keean your one false assumption leads you astray:

This is where I think you have a misunderstanding, if we can statically determine the type, then we can statically determine the exact type, no unions necessary. For example:

x : bool = true
y = if x then "hello" else 32
show_twice(y)

Here we know the type of y is a string because x is true, as such what is passed to show_twice is always string, it is never a string | int because such a thing cannot exist y is always precisely a string or an int, never both at the same time. So for static polymorphism we never need union types.

A function exported from a module can't know at its compile-time, which trait implementation nor instance of a tagged union type it will be called with. That would defeat separate compilation. We need run-time polymorphism precisely because we need extension not because we linked together modules untyped as any dynamically at run-time, but because we use modular separate compilation at compile-time. Moreover, we have no way to trace all types through a program as that would be dependent typing, which is not Turing-complete. So we can't track which instances are in a tagged union type generally, which is the entire raison d'être of tagged union types. The only way you could track that would be dependent typing, which we aren't proposing to do because it isn't even Turing-complete.

What you wrote above does not apply to what I am proposing to use unions of data types for.

Btw, modular separate compilation is absolutely critical to JIT compilation, because the compiler doesn't have to compile the entire program in order to run a compiled portion of it.

Note: this kind of think is not normally allowed because the type of y depends on the value of x being true or false. Such things are called dependent types, and languages like Rust don't allow this, instead they only allow types to depend on other types.

I already knew that years ago.


I think you are not fully understanding the difference between dynamic (runtime) and static (compile time) polymorphism, and this is leading to confusion between us when we discuss this

I am fully understanding. You seem to be a bit confused, perhaps because Haskell typically doesn't support separate compilation. Haskell is global type inference and all libraries are typically presented in source code form and the entire program is compiled monolithically. But this absolutely isn't the way programming will be done with JavaScript.

Lets start with static polymorpism:

id<A>(x: A) : A {return x;}
id(4)

In this program id has a polymorphic type, that means its implementation is the same no matter what type it is passed. In this case only a single implementation of id is required for all types.

Because id calls no methods on A, thus you don't need to specialize the implementation of the id function.

Meta: One minor problem is developing between us, but it could be an indicator of a larger problem of working relationship that could develop. I hope not, so I will speak up now rather than let it fester. I am a stickler for perfection. I have asked you two times to please use backticks ` for quoting your code symbols and keywords (as I have shown above by editing your quote), because that is the idiomatic way to do it in markdown, so these terms appear in a monospace font with a grey background so they are easy to spot in the English text. And you did not reply indicating why you prefer the non-idiomatic style. So it seems you are ignoring this request for us to have a consistent style. I know old habits are hard to break, but really we should use idiomatic markdown so our verbiage is easier for readers who will be reading this for a long-time if we succeed to make a revolutionary programming language.

Edit: I realize you may be preferring quote marks instead of backticks since the latter are erased as rendering, because they copy+paste well (such as for quoting the other person when replying), but I prioritize beauty and readability over convenience. Hopefully one day Github will add a [Quote] button.

Using type-classes we can write generic functions like show_twice that work for any type by looking the specialised version of the function calls it uses up in the dictionaries for the types passed, but is a single generic implementation (the type class dictionary for Show A is an implicit parameter').

Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not?

See the bolded text below.

@shelby3 wrote:

Thus the function will be calling the methods using an array of pointers to the methods.


However it is important to note that the type parameter is invariant, that means we must know the concrete type at compile time (this is the difference between parametric types and universally-quantified types, with universally-quantified types there are some situations like polymorphic-recursion where we might not know the type until runtime, which is why Haskell does this by dictionary passing). So with parametric types 'A' must be a ground type in order to compile the function, which means we can monomorphise and output a specific version of 'show_twice' for the type of the argument passed. So this is what happens:

show_twice("ABC")
compiles to:

show_twice___string(x : String):
    print_string(x)
    print_string(x)

show_twice___string("ABC")

Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not?

See the bolded text below. My point was that creating a specialized function for each implementation of the trait may be too costly in terms of emitted source code in some cases.

@shelby3 wrote:

The function can't know which implementation it will be using until run-time, unless we generate a new version of the function for each implementation of a data type. ... We can't inline the methods into the function unless we generate a new version of the function for each implementation of a data type.

Again I will repeat what I wrote:

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

We need both forms of polymorphism. A tagged union type is not static polymorphism. You were arguing this same point in May and I can see I still haven't been able to get you to understand that without dependent typing, then the state of the instance of a union type can't be statically tracked through all code paths.

The union type is first-class (except on function application as we discussed in May that would be undecidable under type inference). So the state of of instance of a union can be passed around through unfathomable code paths. There is no way to track the state of that dependently.

As you know that dependent typing is totally inapplicable for our programming language. It eliminates Turing-completeness because it can prove that programs halt. And it is very painstaking. Dependent typing is for formal provers and academic research languages.

@keean wrote:

So are there any circumstances where we cannot statically determine the type of something, the answer is if the type of something depends on the result of program IO.

For example if we have a polymorphic collection that we insert values of different types into depending on what key a user presses, we can no longer statically determine what the type of any given element selected from the collection will be.

This changes things significantly, so lets discuss the static case first and then move onto the dynamically polymorphic case.

The issue is not even that narrow. As I explained, separate compilation causes us to lose tracking of the state of the tagged union type at compile-time. And not just separate compilation, but even within a module, because we don't have dependent typing.

You don't need to think of special cases where your thinking was invalid. Your thinking was invalid generally.

I don't know what caused you to think that way, because I remember you had this same thinking in May. And you kept saying we didn't need tagged structural unions, because we could statically determine the state of the instance in your pet examples where I might want to infer a union type. But your pet examples are misleading you, because in general we can't do dependent typing.

All of us have moments where our minds get stuck in the wrong conceptualization. No worries, it has happened to me at other times. Nothing personal. That is why were are a team, to check each other's logic. I just want us to resolve this.

@keean wrote:

This is a compelling reason to not allow if in expressions.

Disagree strongly.

A follow on from above, although we cannot support monomorphisation easily in the final case above, we can use path dependent types to do something similar. If we require the return type from if to be the same on both sides, or perhaps more easily have if as a statement not an expression so there is no return type we can simply do:

if x then:
    show_twice("hello")
else:
    show_twice(32)

Here the type of show_twice depends on the path taken at runtime, but we know the exact type passed to each instantiation of the polymorphic functions show_twice.

There is nothing significantly disadvantageous with allowing the inference to a tagged union since in most cases the choice of if-else will not be known until run-time (which is the entire raison d'être of if-else otherwise the compiler would routinely erase the if-else). And most uses of if-else as expressions in the setting of using that value within the current function, will not resolve to disjoint data types. Either they will subsume to common trait or be the same data type in most cases.

And in the rare instance that the programmer doesn't want a tagged union (mostly for performance reasons), he can optimize manually as you have shown. Also, the compiler can use a much simpler form of path analysis (independent of run-time state) to do that performance optimization without needing dependent typing (although without purity it becomes more difficult for the compiler to know what is safe to reorder). Most of the time that sort of construct is going to be used to return a tagged union, which can't be optimized in any case. So in most cases, the programmer doesn't even need to think about it.

Forcing the programmer to break DNRY (DRY) when there is no advantage that the compiler can't usually optimize, makes for an unnecessarily noisy language. Which makes the code more difficult to follow and reason about. And makes const assignments impossible in some cases.

keean commented

The issue is not even that narrow. As I explained, separate compilation causes us to lose tracking of the state of the tagged union type at compile-time. And not just separate compilation, but even within a module, because we don't have dependent typing.

You don't need to think of special cases where your thinking was invalid. Your thinking was invalid generally.

We can know the types statically, that is the fundamental nature of static type systems. The answer is we do not compile a module to final code but to intermediate code, and when a module function is called from a different module the types become concrete, and this is when the monomorphisation happens. In some systems this instantiation happens at runtime (Look up "Ada Elaboration"), and others at compile time. The key is to consider what gets imported when we import code. In languages with generics it is the intermediate representation and type signatures that get imported, and the code is generated in place.

As I pointed out above anything that does not depend on runtime IO can be evaluated as constants at compile time.

I am not opposed to tagged unions, and we can easily have:

data Either a b = Left a | Right b
let y = if x then (Left "XYZ") else (Right 32)

in which case y would have the type Either String Int

@keean wrote:

So you can see there is no use for 'unions' as we must provide a concrete/ground type as the type parameter for a parametric function, and we can do this if the data is static.

We can create a variant of the function that accepts a dictionary in addition to (or instead of) the specialized function variants for inlining every trait implementation. That was the point of my comment.

I also wrote:

See the bolded text below. My point was that creating a specialized function for each implementation of the trait may be too costly in terms of emitted source code in some cases.

The trait objects alternative also can't be inlined monomorphically. Just accept that polymorphism requires passing in a dictionary.

@keean wrote:

We can know the types statically, that is the fundamental nature of static type systems

Please stop conflating compile-time type with run-time state of instance of a type.

We can't generally know the run-time state of the instance of a type of union where the contained data items are tagged, because it requires dependent typing.

The answer is we do not compile a module to final code but to intermediate code, and when a module function is called from a different module the types become concrete, and this is when the monomorphisation happens. In some systems this instantiation happens at runtime (Look up "Ada Elaboration"), and others at compile time. The key is to consider what gets imported when we import code. In languages with generics it is the intermediate representation and type signatures that get imported, and the code is generated in place.

You are proposing to have a linker either at static or dynamic linking time. This is unthinkable for JavaScript modules at this juncture of the ecosystem (maybe we could change that in future). In any case, whether monomorphism is practical inter-module is orthogonal to our discussion about whether the run-time state of the instance of a type of union can be tracked through code. I say that tracking can not be accomplished without dependent typing.

As I pointed out above anything that does not depend on runtime I/O can be evaluated as constants at compile time.

But there is no way to generally isolate those dependencies without dependent typing or purity. With purity and a monadic effect system, you can isolate certain dependencies.

And when you say a constant (referring to the run-time state of the instance of a type of union), it becomes impossible to track (regardless of whether dependent on I/O or not) what that constant is through the code paths without dependent typing. Just because something never changes and even if we can identify that it doesn't ever change, doesn't mean we can compute its value. You are speaking about UFOs and hyperbolic theory, not reality. I want to make a real language, not talk about fantasies.

I hope you come back grounded in reality soon, because this is turning me off. We are creating too much verbiage again, as we did in May. Can't we get directly to the point?

I am not opposed to tagged unions, and we can easily have:

data Either a b = Left a | Right b
let y = if x then (Left "XYZ") else (Right 32)

in which case y would have the type Either String Int

That is too much boilerplate. TypeScript, N4JS, Ceylon, and others already have anonymous structural unions. Why are you forcing us back to Haskell's flaws? I want to move forward in a modern language design. The compiler can tag the instances of data type in the emitted code, so no need to tag the union. The anonymous structural union types are erased at compile-time, as they reference an instance of a data type. The instance of a data type tracks what type it is.

If I didn't need my complete solution to Wadler's Expression Problem and anonymous structural unions, and I wanted to use a monadic effect system, I would just go use PureScript. No need to create a language in that case.

Dependency typing would be on the order of but worse than Rust's lifetime checking in tsuris and inflexibility.

Declared purity could help us isolate invariants that don't change. But still we would need polymorphism for those things that do change. There is no argument you can make to say we don't need polymorphism.

@keean if you want to create a boilerplate-prone, noisy language that violates DNRY, then I will not participate. I want a very elegant language which is concise. I wouldn't even fathom an Either as an alternative to anonymous structural union. I wouldn't fathom not having everything as an expression available to me. I am not against using them as statements, since many people are accustomed to that statements style. I am trying to allow imperative, impure, mutable data structures programming while also offering functional, pure, and immutable coding options. If you want to go exclusively imperative, impure, mutable data structures programming, then we are in disagreement.

Please let me know what you have in mind.

I was hoping we could finish up the rough design choices within a day or two, so we could move on to coding.

keean commented

@shelby3 wrote:

Please stop conflating type with state of instance of a type.

In static polymorphism all the ground states of every type variable is known at compile time. With runtime polymorphism we cannot know the types at compile time (because they are unknown until runtime). The two have different type signatures. Personally I think most languages to not make a clear distinction in the type system between static and dynamic polymorphism, so it is hard to understand. Rust gets this right, as function generics are all static polymorphism, and trait-objects are needed for runtime polymorphism. You can see the same distinction in C++ where all non-virtual function dispatch can be decided statically at compile time, and you need to mark a method 'virtual' to enable runtime polymorphism. I blame Java, where every method is automatically virtual for making this hard to understand.

What is hard to understand? You haven't written anything in that prior comment that refutes anything I have written before it. I agree with your definitions. So what is your point?

We must choose if we will allow run-time polymorphism. If yes, then we have trait objects and anonymous tagged unions to accomplish it with. They are both enable run-time polymorphism. Choose one or both. I choose both. You choose only trait objects. That doesn't make me incorrect. It is just a different choice. Without anonymous tagged unions, then we can't solve both axes of Wadler's Expression Problem. Why do I have to repeat this?


And the other salient point is you can't easily separate run-time variables from constants without dependent typing. And you can't even track the values of constant values at compile-time (in order to compile static monomorphism) without dependent typing. Thus static polymorphism doesn't work well when you need to track values. You keep conflating typing with values. You can track the union type easily, but you can't track the value in that type easily even if it doesn't change at run-time, because you need dependent typing to do that.

Why are you making this obtuse? Please stop thinking it is myself who doesn't understand.

keean commented

@shelby3 wrote:

if you want to create a boilerplate-prone, noisy language that violates DNRY, then I will not participate. I want a very elegant language which is concise.

It would be great to see some code samples and examples of this elegance. Generally I know great code when I see it :-) For example the algorithms in Stepanov's "Elements of Programming" are probably some of the most elegant code I have seen.

I wouldn't even fathom an Either as an alternative to anonymous structural union.

If you don't name the constructors how can you have something with the same type twice. How would you implement something like:

data Length = Meters Float | Yards Float

Here we have a tagged union where both the types are structurally the same (float) but are nominally different. Isn't nominal typing better because you can create types that model the data, rather than just have and int or a string...

I wouldn't fathom not having everything as an expression.

x=3 is a statement, print x is a statement, there are many things that cause a side-effect and do not return a value. Such things don't make sense as an expression.

I don't like repeating myself, and I dislike boilerplate. However I also like readable and understandable code. I don't want to create another Perl.

My experience with languages like Python are that I want type signatures, because they help me understand what functions do without having to look at the source code for a function. I also think code imports should make it easy to see from the code which import defines which symbol, and where that import is coming from in the file-system.

Also please do not delete these discussions if you stop this project. I put a lot of work into these writings and I need to refer back to these. Thanks.

Edit: I've asked archive.org to archive it:

https://web.archive.org/web/20160919214935/https://github.com/keean/traitscript/issues/2

keean commented

@shelby3 wrote:

What is hard to understand? You haven't written anything in that prior comment that refutes anything I have written before it. I agree with your definitions. So what is your point?

And the other salient point is you can't easily separate run-time variables from constants without dependent typing. And you can't even tract constant values without dependent typing. Thus static polymorphism doesn't work well when you need to track values. You keep conflating typing with values.

We can statically track types (not values). You don't need to track values to track types. Can you provide a code sample where we cannot statically track the types (that performs no IO)?

keean commented

Also please do not delete these discussions if you stop this project. I put a lot of work into these writings and I need to refer back to these. Thanks.

I don't plan on doing so :-)

@keean wrote:

I wouldn't even fathom an Either as an alternative to anonymous structural union.

If you don't name the constructors how can you have something with the same type twice. How would you implement something like:

data Length = Meters Float | Yards Float

Here we have a tagged union where both the types are structurally the same (Float) but are nominally different. Isn't nominal typing better because you can create types that model the data, rather than just have and Int or a String...

You are conflating sum (and product) types (with recursion) with anonymous structural unions. I argue we need both. They are not the same thing nor for the same purpose. Tagged nominal data types are what constitute the members of the anonymous structural union.


I wouldn't fathom not having everything as an expression.

x=3 is a statement, print x is a statement, there are many things that cause a side-effect and do not return a value. Such things don't make sense as an expression.

Did I write mandatory expressions every where? I am getting frustrated when you ignore what I write:

I am not against using them as statements, since many people are accustomed to that statements style. I am trying to allow imperative, impure, mutable data structures programming while also offering functional, pure, and immutable coding options.

Btw, include myself in "many people are accustomed to that statements style".

keean commented

@shelby3 I am thinking there may be a way to meet both our requirements for union types. If we allow creation of new nominal types from any structural type, then we can do something like:

newtype Meters = Float
newtype Yards = Float

y = if x then (3 : Yards) else (3 : Meters)

The type of y would be Yards | Meters, and we could de-construct:

match y:
    x : Yards -> ...
    x : Meters -> ...

My view at the moment is that using plain types like 'int' or 'sting' in a tagged union is bad practice because is lacks all sense of meaning (is the value a length, a volume, etc...)

@keean wrote:

I don't plan on doing so :-)

Thanks. Generally I presume you have extremely high ethics (and even better interpersonal skills than myself), based on our interaction in May. But we have only known each other for a few months, and some people on the Internet are really unpredictable. Apology for any wonderment. Thanks again for reaffirming my impression of you.

keean commented

Did I write mandatory expressions every where? I am getting frustrated when you ignore what I write:

Good, sounds like we agree here, I obviously read 'everything is an expression', to mean 'nothing is a statement', which is not what you meant.

keean commented

There is little difference between a sum type and a tagged union, in the following:

data Either a b = Left a | Right b

The tags are Left and Right, in other words we get to name the tags to something more meaningful than just int or string. We now match on the words Left and Right.

with

y = if x then 'xyz' else 3

The tags are string and int, and we can match on them as above.

@keean wrote:

What is hard to understand? You haven't written anything in that prior comment that refutes anything I have written before it. I agree with your definitions. So what is your point?


And the other salient point is you can't easily separate run-time variables from constants without dependent typing. And you can't even track constant values without dependent typing. Thus static polymorphism doesn't work well when you need to track values. You keep conflating typing with values.

We can statically track types (not values). You don't need to track values to track types. Can you provide a code sample where we cannot statically track the types (that performs no IO)?

In general, this is why you can't statically monomorphize unions containing a disjunction of two or more of data types (or primitive types such as String and Number) unless you have dependent typing. My solution will work to compile this code.

trait Print {   // Rust-esque type-class, not a Scala trait
  print()
}
impl Number for Print {
  print() { this.toString() }
}
impl String for Print {
  print() { this.toString() }
}
f(x: Boolean) = if (x) 'string' else 3
g<A: Print>(x: A) = x.print()
h(x: Boolean) = g(f(x))

Static typing does not tell you the run-time values above and thus you can't statically monomorphize the union. It is inherently polymorphic. Even if you know the value of the input to h() is a constant, you still need dependent typing to trace it through to g().

How much more elegant and concise could it be?

You (and perhaps all of us except the speaker) struggle with words and theory. I think you need code to see the point.

Edit: @keean has provided a poignant example of the above.

@keean wrote:

There is little difference between a sum type and a tagged union

Except I never wrote otherwise. If you read what I wrote, you realize I said there is a salient difference between a nominal sum type and an anonymous structural union where the possible types in the union (and their instances) are nominal data types which are tagged. The anonymous structural union doesn't need a separate useless boilerplate declaration (with a useless name), as it can be inferred or declared at the use-site. And by definition of "anonymous structural", we want all anonymous structural unions to be matched (equivalent) on structure, not on some useless boilerplate name of the union which makes separate instances of the same structural type non-equivalent.

We need both nominal sum types and anonymous structural unions. I am not getting rid of sum, product, and recursive types. I am augmenting that system with anonymous structural types.

Especially in technical discussion which are very exacting, it is very important to pay attention to every word in an English sentence. Just as it is important to pay attention to every term and keyword in a program source code.

@keean wrote:

My view at the moment is that using plain types like 'int' or 'sting' in a tagged union is bad practice because is lacks all sense of meaning (is the value a length, a volume, etc...)

The lack of nominal tagging of the anonymous structural union is the desired feature and is intentional.

The data type and their instances in the union are tagged of course, but they are tagged where they were declared with the data or class keyword.

keean commented

I think you should be including more information with the type. For example distance units. Integers on their own don't really mean anything, apart from abstract dimensionless counting.

Really just saying I have an Int | String seems to be writing deliberately unreadable unclear code. Isn't it much better to declare that x is a type of 'length in meters', y I'd a last name etc. These seem much more meaningful types.

@keean wrote:

I think you should be including more information with the type. For example distance units. Integers on their own don't really mean anything, apart from abstract dimensionless counting.

Really just saying I have an Int | String seems to be writing deliberately unreadable unclear code. Isn't it much better to declare that x is a type of 'length in meters', y I'd a last name etc. These seem much more meaningful types.

You continue to conflate the data (sum, product, and recursive) types with the anonymous structural type. If you want more meaning, then declare a data type and put it in your anonymous structural union, Distance Int | Distance String.

The purpose of the anonymous structural union is not the purpose of the sum data type. They have different reasons to exist in the language.

Anonymous structural unions must be inferrable without needing to match a pre-existing declaration for a sum type. And all anonymous structural union types must be matched on structure, not on nominal name. That structure can include data types which have nominal type (not just primitive types). Anonymous structural unions exist to eliminate boilerplate, per the example I showed you. And because they will be automatically type-case implemented for traits by the compiler, per the example I showed you. Anonymous structural unions interacting with trait bounds also enables my elegant boilerplate-less solution to extension along both axes of Wadler's Expression Problem.

keean commented

Let's look for some classic examples of the expression problem and consider how they would be implemented.

I invited @paulp here, because he was very outspoken about flaws in Scala, I agree Scala is a clusterfuck of mixing too many paradigms1 (especially that Jurassic ossifying anti-pattern virus subclassing), and I hoped maybe Paul might be outspoken here, because I think he really wants a small, sound language. My recollection from Scala Google Groups discussion is he has valuable technical insights.

  1. https://www.quora.com/What-are-the-downsides-of-Scala-as-a-programming-language

    https://www.reddit.com/r/scala/comments/4246qc/heres_why_scala_sucks/

    https://news.ycombinator.com/item?id=9398911

    https://news.ycombinator.com/item?id=11837256

    http://techblog.bozho.net/i-dont-like-scala/

    https://www.infoq.com/news/2011/11/yammer-scala

    http://grundlefleck.github.io/2013/06/23/using-scala-will-make-you-less-productive.html#the-bad-thingstrade-with-scala

    http://neopythonic.blogspot.com/2008/11/scala.html

    http://fupeg.blogspot.com/2008/12/is-scala-too-hard-or-are-you-just.html

    https://www.infoq.com/news/2011/11/scala-ejb2

    http://programmers.stackexchange.com/questions/25509/what-do-java-developers-think-of-scala#answer-25598

    The last two rave about Fantom, which doesn't even support generics (type parametrization) nor tuples. I wrote about Fantom in 2011, predicting the eventual arrival of ZenScript. And coincidentally someone asked on June 29, 2016, if I had made any progress since 2011. Well yes and no. I also manage to become chronically ill since May 2012 but I'm working on hard on curing that, e.g. ran 3 times for 2kms each within 8 hours the evening and night of this comment to keep my cognition up to 80% level and minimize brain fog and Chronic Fatigue Syndrome symptoms.

    And let's blame it all on @jdegoes 😜 , as he advised me to learn Scala back in 2009 when I was expressing frustration with Haxe.

keean commented

See: http://anders.janmyr.com/2013/06/solving-expression-problem-in-javascript.html?m=1

Functional Style

sealed trait Expr
case class Add(e1: Expr, e2: Expr) extends Expr
case class Sub(e1: Expr, e2: Expr) extends Expr
case class Num(n: Int) extends Expr

def value(e: Expr): Int = e match {
  case Add(e1, e2) =>
    value(e1) + value(e2)

  case Sub(e1, e2) =>
    value(e1) - value(e2)

  case Num(n) => n
}

Problem: adding a class we need to modify all the functions that match against the type.

keean commented

See: http://anders.janmyr.com/2013/06/solving-expression-problem-in-javascript.html?m=1

Object Oriented Style

sealed trait Expr {
  def value: Int
}

case class Add(e1: Expr, e2: Expr) extends Expr
  def value = e1.value + e2.value
case class Sub(e1: Expr, e2: Expr) extends Expr
  def value = e1.value - e2.value
case class Num(n: Int) extends Expr
  def value = n

Problem: adding a function we need to change all the existing classes.

keean commented

See: http://anders.janmyr.com/2013/06/solving-expression-problem-in-javascript.html?m=1

Dynamic Style

function Add(e1, e2) {
    this.e1 = e1;
    this.e2 = e2;
}
Add.prototype.value = function() { return this.e1 + this.e2; };

function Sub(e1, e2) {
    this.e1 = e1;
    this.e2 = e2;
}
Sub.prototype.value = function() { return this.e1 - this.e2; };

function Num(n) {
    this.n = n;
}
Num.prototype.value = function() { return this.n; };

Adding a class:

function Add(e1, e2) {
    this.e1 = e1;
    this.e2 = e2;
}
Add.prototype.value = function() { return this.e1 + this.e2; };

function Sub(e1, e2) {
    this.e1 = e1;
    this.e2 = e2;
}
Sub.prototype.value = function() { return this.e1 - this.e2; };

function Num(n) {
    this.n = n;
}
Num.prototype.value = function() { return this.n; };

Adding a function:

Add.prototype.toString = function() {
  return '(' + this.e1.toString() + ' + ' + this.e2.toString() + ')';
}
Sub.prototype.toString = function() {
  return '(' + this.e1.toString() + ' - ' + this.e2.toString() + ')';
}
Num.prototype.toString = function() {
  return this.n;
}
Mul.prototype.toString = function() {
  return '(' + this.e1.toString() + ' * ' + this.e2.toString() + ')';
}

Problem: JavaScript is untyped.

keean commented

Note: anonynimity is not a condition of solving the expression problem. We just need extension in two directions, that is the set of types must be extensible, and the set of functions on those types must be extensible.

Let's use the term "module" for a thing like an expression (there is some reasoning for this).

For extensibility we need two things, the ability to add a new function to an existing module, and the ability to add a new type to an existing module. For type safety we need to ensure every function in a module is implemented for every type in the module.

Again note this is somewhat at odds with anonymous unions because we need to have these two checks for type safety.

To satisfy your 'no boilerplate' requirement we must be able to infer modules.

I think this is all the requirements for a solution we can all be happy with.

keean commented

So what would a solution look like? Using the keyword module for now, something like this:

module ModShow a : [Int, String]:
    trait Show a:
        show(x : a)
    impl Show Int:
        show(x : Int):
            print_int(x)
    impl Show String:
        show(x : String):
            print_string(x)

Extending by type, we must provide definitions for all traits for that type:

typeextend module ModShow with Float:
    impl Show Float:
        show(x : Float):
            print_float(x)

Adding a new trait, we must provide implementations for each type

traitextend module Show a:
    trait Read a:
        read() : a
    impl Read Int:
        read() : Int:
            input_int()
    impl Read String:
        read() : String:
            input_string()
    impl Read Float:
        read() : Float:
            input_float()

Problem: the order in which modules are extended becomes a problem.

keean commented

A better solution may be to have two independent features, type-classes and type-families.

We then need a way to check all type classes are implemented for all types in a family.

This is critical for static typing.

Looking at the JavaScript solution, and using it:

var x = new Num(1);
var y = new Num(2);
var z = new Add(x, y);
var w = new Sub(x, y);
var e = new Mul(z, w);

e.toString(); // returns ((1 + 2) * (1 - 2))

In JavaScript this is duck-typing because the existence of toString in e is only checked when you call it.

In static typing we require the type check to happen on definition, not on use.

keean commented

Thinking about this overnight, I don't see any reason to not allow inferring anonymous unions types from functions, at least local ones, as we can go further and infer a complete type.

(if x then "abc" else 123).show()

we infer String | Int' for theifexpression, and then constrain it to be an instance ofshowwhich means bothIntandString` have to have implementations of show.

Which all seems good, until I start thinking about the typing algorithm. I will start a new topic for typing union types to explore the issues.

Edit: this is of course only for local definitions. Module exports will need type signatures (as we discussed in the main language design thread) and that will include defining the unions. I have a suggestion for a unified module/record/type-class system using implicits that should be a new thread as well.

Also add Object Algebras as another attempt which fails to solve Wadler's Expression Problem (and erroneously claims to have done so!) because it removes an aspect of compositional degrees-of-freedom.

@keean wrote:

Problem: JavaScript is untyped.

That isn't the only problem. That example code is adding the new operations (aka methods) to every instance of the class, which makes it impossible to compose over adding new operations. Adding new operations globally is more or less useless because it would mean the class would need every possible extension used any where in the entire program and then there would be method naming conflicts. This would make separate compilation impossible, because of the need for global coordination and coherence on method name conflicts.

P.S. the example code from your quoted comment forgot to declare the Mul class.

Edit: someone else also realized the Object Algebras fails to generally solve the Expression Problem.

@keean wrote:

For type safety we need to ensure every function in a module is implemented for every type in the module.

Disagree. Per the prior comment of mine, we need the compositional degrees-of-freedom to add operations (i.e. trait typeclasses) for type variants (i.e. data or class types) at-will at the use-site naming scope, and not bound to some global (naming and other) coherence.

Let's use the term "module" for a thing like an expression (there is some reasoning for this).

For extensibility we need two things, the ability to add a new function to an existing module, and the ability to add a new type to an existing module.

It is anti-compositional to think of the type variants and their operations as necessarily bound to a single module.

Modules are an orthogonal concept for controlling access (to names) and units of separate compilation.

Again note this is somewhat at odds with anonymous unions because we need to have these two checks for type safety.

Disagree. Please don't conflate orthogonal concepts.

I realize that the way I completely solved the Expression Problem is mind boggling to others at-first, even though my solution is quite simple.

Note: anonynimity is not a condition of solving the expression problem.
To satisfy your 'no boilerplate' requirement we must be able to infer modules.

The anonymity of the anonymous structural union just means it is not named. The items in the union are nominal and are inferred normally.

Problem: the order in which modules are extended becomes a problem.

Yeah that is another example of why we don't conflate modules with the Expression Problem.

I realize I don't have yet a comment where I explain holistically my solution for extension over both axes of Wadler's Expression Problem, while avoiding the pitfalls of losing compositional degrees-of-freedom.

Refer to the following comments:

#10 (<--- recently insert this one)
#2 (comment)
#1 (comment)
#2 (comment)
microsoft/TypeScript#10844 (comment)
#1 (comment)
#6 (comment)
#8 (comment)
#1 (comment)
#8 (comment) (<--- recently insert this one)

Those comments taken together holistically explain well my complete, "fully compositional" solution to Wadler's Expression Problem, where "fully" is relative lol (there is no absolute perfect compositional solution more like peeling off layers of an onion in the multi-dimensional space).

keean commented

@shelby3 I have started a new thread about type inference for union types, working under the assumption that we are having them - so that this discussion does not hold up progress in other areas: #5

@keean wrote:

A better solution may be to have two independent features, type-classes and type-families.

We then need a way to check all type classes are implemented for all types in a family.

In JavaScript this is duck-typing because the existence of toString in e is only checked when you call it.

In static typing we require the type check to happen on definition, not on use.

The static checking is at the function application use-site for typeclasses, so only those trait bounds (from the functional declaration/defintion site) required at the use-site are required to be implemented. We don't need to enforce some global coherence as that would be anti-compositional.

keean commented

@shelby3 wrote:

The static checking is at the function application use-site for typeclasses, so only those trait bounds (from the functional declaration/defintion site) required at the use-site are required to be implemented. We don't need to enforce some global coherence as that would be anti-compositional.

The point of type-classes is we don't need to statically check the type at the call site. We establish a constraint on the type of the function argument based on the use in the function. So given:

f(x) = show(x)

We know that show requires the object to implement the Show trait, so the type we infer for f is:

f<A : Show>(x : A)

You can now pass any type that implements Show there are no unions necessary.

@keean wrote:

The point of type-classes is we don't need to statically check the type at the call site.

Incorrect. There is confusion.

You can now pass any type that implements Show

And the compiler must check that the type being passed (input) to the function at that call site, has an implementation in scope for the Show trait bound.

Said "check" is to "statically check" because "statically" means to do it at compile-time, not run-time.

Somehow in your mind you conflated that the compiler checking for the available implementation of the trait bound is somehow in your mind "not checking" because it isn't checking the existence of the trait bound or something.

there are no unions necessary.

You are thinking that my solution to the Expression Problem requires unions on the trait bound at the function/method declaration (definition) site. And you are incorrect to think that, as I have not proposed unions on trait bounds at the declaration site. Please try to understand my solution. This discussion is getting noisy because you haven't yet understood my solution.

@shelby3 wrote:

I agree unions of traits make no sense except for the use case of my prior comment where they are resolved at compile-time and are really just sugar for not needing to declare a permutation of function names.

keean commented

@shelby3 I guess I don't understand your solution because you are explaining it using reference to subtyping, like greatest common bound. With parametric types, and polymorphic functions you don't have any of this.

This is what I find confusing, you keep referring to the way languages like Scala handle classes, inheritance, subtyping, and subsumption, whereas I am talking about a type and type class based approach.

Could you write the algebraic evaluation example (The one with Add, Mul etc) using your solution?

Let's try to get you to understand clearly my solution to the Expression Problem asap, so we can move forward. Please revisit that aforementioned linked post and re-read every linked comment more carefully (I also edited those comments for increased clarity since you've last read them) Especially re-read this link:

#2 (comment)

@keean wrote:

I guess I don't understand your solution because you are explaining it using reference to subtyping, like greatest common bound.

Inaccurate historical perspective. I wrote that the disjoint union is an alternative to subsuming to a GLB and eliminates the subclassing issues of subtyping but not the covariance issue which must be handled with an invariant type parametrized container such as List (and you confirmed that typeclasses aka traits, also eliminate subclassing issues).

What I am explaining is that my proposed use of disjoint unnamed structural union types (containing tagged data types) eliminates the need to subsume to a common type (GLB) in order to do polymorphism. Instead in my solution, we keep the knowledge of the union of data types (and don't erase it to the GLB), so that when we call the function with a trait bound, then we know which implementations to provide and the compiler can build the type-case logic automatically. Thus enabling extension in both axes of the Expression Problem. It is really that simple.

With the alternative of subclassing, we were forced in other languages to subsume to a common supertype (GLB). Or with Rust trait objects (i.e. virtualized typeclasses), we subsume to the trait object's trait bound (via assignment of the data instance to the Rust trait object). Both of those erase the knowledge about which data types are in those instances. Once the compiler erases that knowledge, then the compiler can not know which implementations to use if we want to pass those (as subsumed) instances as input to a different function which requires a different trait bound. In my solution by keeping the static (compile-time) knowledge of the union of data types (i.e. not subsuming), we retain the extensibility to apply those instances to any function and any trait bounds, i.e. delayed binding to trait bounds (same as for the homogeneous case in Rust and Haskell but also) in the heterogeneous case.

Typeclasses in Haskell and Rust work very well when working only with a homogeneous nominal data type. We can delay binding to a trait bound until the function is called. But they don't work so elegantly with heterogeneous (i.e. unnamed structural union of) nominal data types, i.e. polymorphism, because in Rust these are handled by prematurely subsuming the data types to the trait bound of the trait object, thus discarding the static (compile-time) information about which data types are inside the trait object. Thus breaking the solution to the Expression Problem for the polymorphic, heterogeneous case. My solution fixes that problem.

I provided an example for you already, showing that "heterogeneous (i.e. unnamed structural union of) nominal data types" is the same as polymorphism (i.e. polymorphism required either via subsuming to subclass supertype, subsuming to the trait bound of a Rust trait object, or my solution... but only my solution solves the Expression Problem).

Note trait objects have some benefits, so we may want them also, as it is one of the two ways we can do polymorphism with typeclasses. The programmer will choose which way to polymorphism, depending on the circumstances of their use of polymorphism. Trait objects can not solve the Expression Problem (in the heterogeneous case and that is the only case where trait objects are useful), so that is their drawback (also they will be less performant than my solution because they discard static compile-time typing information that can be used by the compiler to optimize).

keean commented

Okay lets look at the example:

trait Print {   // Rust-esque type-class, not a Scala trait
  print()
}
impl Number for Print {
  print() { this.toString() } 
}
impl String for Print {
  print() { this.toString() }
}
f(x: Boolean) = if (x) 'string' else 3
g<A: Print>(x: A) = x.print()
h(x: Boolean) = g(f(x))

I understand what is going on here, although I don't really see how it solves the expression problem.

I don't think this example shows any of the potential problems, so I think its okay. However I think things get more complicated if f and g are not top-level functions:

h<A : Print>(x: Boolean, g: (x: A), f: (y: Boolean) -> Int | String) = g(f(x))

which is okay with full type annotations but we would run into trouble trying to infer this from:

h(x, g, f) = g(f(x))
keean commented

@shelby3 wrote:

What I am explaining is that my proposed use of disjoint union types (containing tagged data types) eliminates the need to subsume to a common type in order to do polymorphism.

But we don't have to subsume to a common type. Each use of a function results in a new copy instansiated with the particular types provided for the type-arguments at each use site.

g<A: Print>(x: A) = x.print()

Alternatively this is implemented by passing the relevant type-class dictionary to g as an implicit argument.

If A is a union type, say Int | String then we have to pass both dictionaries, or we would have to emit type case that calls g in the branches of the type-case where the type is known statically.

I have not mentioned subclassing :-)

@keean wrote:

What I am explaining is that my proposed use of disjoint union types (containing tagged data types) eliminates the need to subsume to a common type in order to do polymorphism.

But we don't have to subsume to a common type. Each use of a function results in a new copy instansiated with the particular types provided for the type-arguments at each use site.

It is as if you didn't even comprehend my prior comment. We apparently have very different mental models at this juncture and need to try to attain mutual understanding.

I just finished explaining to you that what you wrote above works very well for the homogeneous case where the monomorphism can be employed. But it doesn't work for the polymorphic (heterogeneous) case because the only solution before mine was to use a Rust trait object which doesn't solve an aspect of the Expression Problem, because the trait object is prematurely bound to a specific choice of trait bounds and is not extensible to other trait bounds. The entire point of my solution is to fix that specific problem.

I have not mentioned subclassing :-)

I have mentioned it because I am comparing the ways type systems can handle polymorphism:

  • subsuming to subclass supertype
  • subsuming to trait object's trait bounds
  • retain an unnamed structural disjoint union of the tagged data types

Only the last one (my solution) solves the Expression Problem in the polymorphic (heterogeneous) case.

keean commented

@shelby3 this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf

Edit: Reading the paper, it looks like they run into the usual problem of needing to have whole-program compilation.

keean commented

@shelby3 Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point). How do they solve the expression problem. I think I need to see what the 'expression evaluation' example I provided above looks like in your system. To recap, I want to have some objects like Mul, Add, and Num that I can use to define expressions, and a function eval that calculates the result of the expression.

I want to be able to add a new operator (object) like Sub without touching the definitions of any of the existing objects or functions.

I want to be able to add a new function, for example print that will print any expression, without touching the definitions of any of the existing objects or functions.

Finally when writing the code, you do not know what expression I will want to evaluate or print, as I am the user of the library.

What would this look like with your solution?

@keean wrote:

this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf
Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point)

Yeah for example I quote from the paper, “it is still not possible to form unions of values of generic types, but just finite enumerations of tagged values”.

Interestingly I wrote the similar point on the Rust forum in May, as quoted from the paper, “subtyping
corresponds to set containment
”. Note it eliminates the undesirable aspects of subclassing and subsuming (often associated with subtyping) for polymorphism in other subclassing languages and Rust. And combined with typeclasses, trait objects entirely eliminate covariance issues (becomes invariant always) for parametrized typeclasses but my solution requires invariant type parametrized containers (e.g. List) because the union does express a subtyping relationship.

And my discovery and publish in the Rust forum in May apparently predates (is prior art for) theirs. I was pushing for union types to improve the polymorphism/subtyping/heterogeneous-parametrization issue since at least 3 years ago. As they said, it is not that novel of an idea. The novel aspect is how I combined it with typeclasses to form a complete solution to the Expression Problem, which is out-of-the-scope (not considered by) that paper you cited.

keean commented

Ill have a go...

struct Num {
    n : Int
}

struct Add<A, B> {
    e1: A
    e2: B
}

trait Eval {
    eval() : Int
}

impl Eval for Num {
    eval() {
        return this.n
    }
}

impl<A, B> Eval for Add<A, B> { 
    eval() {
        return this.e1 + this.e2
    }
}

This solution in Rust seems quite good, you can add new operators without touching the above code:

struct Mul<A, B> {
    e1: A
    e2: B
}

impl<A, B> Eval for Mul<A, B> { 
    eval() {
        return this.e1 * this.e2
    }
}

And I can add new functions without touching the above code:

trait Print {
    print() : String
}

impl Print for Num {
    print() {
        return int_to_string(this.n)
    }
}

impl<A, B> Print for Add<A, B> { 
    print() {
        return e1.print() + "+" + e2.print()
    }
}

impl<A, B> Print for Mul<A, B> { 
    print() {
        return e1.print() + "*" + e2.print()
    }
}

So we don't need any unions so far... but I think I can see where they come in.

I can do the following without unions:

(new Mul(new Add(new Num(1), new Num(2)), new Num(3))).print()

But that is all statically determined, next post I am going to see if I have got the union bit right.

@keean wrote:

Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point). How do they solve the expression problem. I think I need to see what the 'expression evaluation' example I provided above looks like in your system. To recap, I want to have some objects like Sub, Add, and Num that I can use to define expressions, and a function eval that calculates the result of the expression.

I want to be able to add a new operator (object) like Mul without touching the definitions of any of the existing objects or functions. [Edit: swapped Mul and Sub to be consistent with your upthread example]

I want to be able to add a new function, for example print that will print any expression, without touching the definitions of any of the existing objects or functions.

What would this look like with your solution?

Rust's trait objects allow you to do that because they allow polymorphism for adding new data type variants such as Mul and the typeclasses mechanism allows you to add new operations such as Print orthogonal to the declaration (definition) of the data types (so no need to recompile the data types which is the requirement of the Expression Problem), so you might think you don't need my solution.

But creating a trait object with trait bound Print prematurely binds your Print trait bound (containing the print() method operation) to the variant data types you assign to the instances of said trait object. And this prevents you from implementing a new operation Show and applying it to said preexisting trait object with bound Print, without recompiling said trait object. Thus violating the requirement of the Expression Problem.

My solution instead retains the typing information about the data types in the structural union (instead of assigning to a trait object), so that the instances of the said structural union can be passed as input to any function with any trait bound (and of course we provide implementations for that said trait bound for all data types listed in the said structural union, but these don't require any recompilation of preexisting code).

Think about this way. Typeclasses (e.g. Rust traits) differ from subclassing because they delay binding of implementation of interface until the function call site. But that only applies for the monomorphism case. Trait objects bind the implementation to the interface at the time of assignment to the trait object, which is premature. My solution extends delayed binding of implementation to the function call site for the polymorphism case.

Hopefully it clicks now for you? Did you have that "A ha" moment now?

Again note that trait objects don't require invariant collections, so they have a use-case where they are an advantage. My solution requires invariant containers (e.g. List) because it is not invariant for the type parametrized container, given that unions express a subtyping relationship.

keean commented

So I think where I need the unions is in something like a parser:

y = if x:
        new Add(new Num(1), new Num(2))
    else:
        new Mul(new Num(1), new Num(2))
y.print()

So y would have the type Add<Num, Num> | Mul<Num, Num> and y.print() would have to be passed both dictionaries.

But with a parser its all going to go horribly wrong because we will need recursive types, and at every parse decision we will double the number of terms in type.

Consider the type signatures of the parser functions

parse_num(in : Stream) : Num
parse_add(in : Stream) : Add<Num, Num>
parse_mul(in : Stream) : Mul<Num, Num>
parse_expr(in : Stream) : Num | Add<Num, Num> | Add<Add<Num, Num>, Num> | Add<Add<Add<Num, Num>, Num>, Num> ... etc forever.

So we will also need recursive types, and be able to do something like:

type Expr = Num | Add<Expr, Expr> | Mul<Expr, Expr>
parse_expr(in : Stream) : Expr

Is that going to work? Can we infer the recursive types? What does it mean for dictionary passing?

@keean wrote:

I can do the following without unions:

(new Mul(new Add(new Num(1), new Num(2)), new Num(3))).print()

But that is all statically determined...

Exactly. Once you introduce polymorphism then without unions you will need to discard statically (in the source code) the struct types and assign to a Rust trait object. If you choose Eval for the bound of said trait object, then you won't be able to Print on it. Even if you have the foresight to put Eval+Print bound on said trait object, it still won't be open to extension to other trait bounds that were not created yet.

My solution isn't only applicable to type parametrized containers (such as List) but also in general for polymorphism as you have explained above. Thus the disadvantage of my solution compared to trait objects being that for its interoption with type parametrized containers, we are forced to use an invariant container such as List (which maintains invariance on the type parameter despite the subtyping relationship expressed by union types), is inapplicable (irrelevant) when using my solution not with a type parametrized container.

keean commented

@shelby3 What do you think about the recursive typing issue in 'parse_expr', I think we may be able to infer it, and then it is a case of working out what dictionaries will be required in the trait-method. I think I get what is going on, and I think a parser and expressions are a good example to work with, as it is the core part of writing a compiler, which is something we are doing to implement the language.

I think its time to switch to the inferencing discussion: #5

@keean wrote:

I think its time to switch to the inferencing discussion: #5

Agreed.

I don't know anything about zenscript, but union types of sufficient power can enable overlapping ADTs and functions that are polymorphic across the exact shape of those ADTs.

You probably won't be able to support infinite types (surely not type inference for them), but if you used fixed point data, you don't need to (though that requires higher-kinded types to be really useful).

As an example in present day Scala:

case class Mul[A](left: A, right: A)
case class Sub[A](left: A, right: A)

type Coproduct[F[_], G[_], A] = Either[F[A], G[A]]

type Expr[A] = Coproduct[Mul, Sub, A]

case class Fix[F[_]](unwrap: F[Fix[F]])

You can regain a recursive definition for Expr[A], for example, by using Fix[Expr], and with a little effort, you can see how this representation is basically a "poor person's union type", albeit a higher order one. You can also see how ADTs can overlap and thus reuse parts of other ADTs, and how operation can be defined such that they require only the minimum components necessary.

As for how all this would translate into zenscript (or not), I'm not sure since I only learned of this project and this issue today thanks to an IM from @shelby3. Good luck Shelby! 😄

keean commented

@jdegoes its early days, so there's plenty of scope to influence the direction of the project :-) I have started building the compiler in JavaScript, with JavaScript as the first target. Big items to sort out are the syntax and the type system. I have been playing with type systems for a while, especially type inference, so I have a pretty good idea of how to implement it.

I think the core feature set at the moment are type-classes and union types. I think we will go with that and try and rapidly get to a first prototype, so that we can start writing programs, probably self-hosting the compiler would be the obvious thing to do.

Type inference for recursive types is not a problem in my experience, the type system internally works in the universe of regular-trees, and then we convert to 'Mu' types for display. For example:

((\x . ((x x) x)) (\x . ((x x) x))) : {} |- ((a -> b as a) -> b as b)

@jdegoes wrote:

I don't know anything about zenscript, but union types of sufficient power can enable overlapping ADTs and functions that are polymorphic across the exact shape of those ADTs.

If I understand your point correctly, then we also realized that and is one of the reasons we need union types. That when combined with a novel way of resolving typeclasses for unions also enables us to truly solve the Expression Problem where apparently others haven't even they apparently erroneously claimed to.

You can regain a recursive definition for Expr[A], for example, by using Fix[Expr], and with a little effort, you can see how this representation is basically a "poor person's union type", albeit a higher order one.

We want to support a first-class unnamed structural union, to eliminate boilerplate and also so the inference engine can infer structural union types.

We are trying to provide what we see is a missing variant in programming language offerings, especially one designed to be compiled to JavaScript (although other targets might be a possibility as well). That is a strict evaluation, no subclassing(!), typeclasses, and first-class structural unions, for greater extension capabilities in a soundly typed language that is also uncluttered with paradigms and thus reasonably simple to grasp. Also a syntax reasonably close to JavaScript. We're distilling from the best features of each language.

We are building this because @keean and I decided we wanted to code (or at least experiment) in a language like this. And we see a gap in the market that is not filled. Kudos thus far to @keean on doing the most on implementation thus far. So far, we also have some interest expressed from @SimonMeskens and @svieira (both of whom I recently met via discussions at the TypeScript Github).

Thanks John for visiting our nascent project.

keean commented

@shelby3 I am closing this issue because I think we are decided that anonymous structural unions should be in the language in some form (at least rank1).

@keean can you please edit the close to add a Label that indicates a favorable resolution?

I think I want flow typing and cleanest solution to null employing anonymous unions.

keean commented

I think "flow typing" is what I call path-dependent typing, and if so, then yes it is something I want too.

Regarding null I think that makes sense. Null is effectively the unit type (void), and can be inferred into any union type.

There is a small problem with our type class approach, because normally we would want all types in the union to implement the type classes required by a function, when null will not implement all (if any) of them.

The solution is to force the programmer to explicitly provide a type-case for Null if null does not implement the type-class required for a function.

This is really a very simple axiom, and in the future we might allow more complex axioms, like inequality constraints on argument types.

This will feel almost exactly like dependent typing except we never depend on types at runtime, instead we force the user to statically insert a check value. This is really the combination of "path dependent types" and "refinement types".

keean commented

I notice Ceylon always requires the type parameters, as in:

Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box) 
        given Box<Element> { ... }

I am sure we are okay to infer them, but it might cause problems with other Ceylon-like types, we just have to be careful as we expand the type system, and ideally keep it as simple as possible.

As long as not all types can be Null, I'm fine. As far as I'm concerned, no languages should have Null at all, a union type or an empty monad is all you need. Implicit nullable types are evil.