keean/zenscript

Fast track transpiling to TypeScript?

Opened this issue · 142 comments

Building off the reasoning and justification in my self-rejected issue Fast track transpiling to PureScript?, I have a new idea of how we might be able to attain some of the main features proposed for ZenScript, without builiding a type checker by transpiling to TypeScript.

If we can for the time being while using this hack, presume that no modules will employ differing implementations of a specific data type for any specific typeclass, i.e. that all implementations of each data type are the same globally for each typeclass implemented (which we can check at run-time and throw an exception otherwise), then the module can at load/import insure that all implementations it employs are set on the prototype chain of all the respective classes' construction functions. In other words, my original point was that JavaScript has global interface injection (a form of monkey patching) via the prototype chain of the construction function, and @svieira pointed out the potential for global naming (implementation) conflicts.

So the rest of the hack I have in mind, is that in the emitted TypeScript we declare typeclasses as interfaces and in each module we declare the implemented data types as classes with all the implemented interfaces in the hierarchy. So these classes then have the proper type where ever they are stated nominally in the module. We compile the modules separately in TypeScript, thus each module can have differing declarations of the same class (because there is no type checking linker), so that every module will type check independently and the global prototype chain is assured to contain the interfaces that the TypeScript type system checks.

So each function argument that has a typeclass bound in our syntax, will have the corresponding interface type in the emitted TypeScript code. Ditto typeclass objects will simply be an interface type.

This appears to be a clever way of hacking through the type system to get the type checking we want along with the ability to have modules add implementations to existing data types with our typeclass syntax. And this hack requires no type checking in ZenScript. We need only a simple transformation for emitting from the AST.

As for the first-class inferred structural unions, TypeScript already has them, so no type checking we need to do.

It can't support my complex solution to the Expression Problem, but that is okay for starting point hack.

I think this is way we can have a working language in a matter of weeks, if we can agree?

That will give us valuable experimentation feedback, while we can work on our own type checker and fully functional compiler.

TypeScript's bivariance unsoundness should be avoided, since ZenScript semantics is to not allow implicit subsumption of typeclass bounds, but this won't be checked so it is possible that bivariance unsoundness could creep in if we allow typeclasses to extend other typeclasses. Seems those bivariance cases don't impact my hack negatively though:

When comparing the types of function parameters, assignment succeeds if either the source parameter is assignable to the target parameter, or vice versa.

Not a problem because of course an interface argument type can never to be assigned to any subclass.

When comparing functions for compatibility, optional and required parameters are interchangeable. Extra optional parameters of the source type are not an error, and optional parameters of the target type without corresponding parameters in the target type are not an error.

When a function has a rest parameter, it is treated as if it were an infinite series of optional parameters.

We should be able to design around this.

Also a compile flag can turn off TypeScript's unsound treatment of the any type.

TypeScript is structural but there is a hack to force it to emulate nominal typing. We could consider instead transpiling to N4JS which has nominal typing and soundness, but it is not as mature in other areas.

keean commented

I had actually planned to start with no optimisation or type checking :-) Simply parse the source, discard the typing information and output the JavaScript. You can then already start experimenting with the syntax and writing simple programs, and you will simply get runtime errors (or bugs) in the JavaScript.

Type-classes are not that simple however, because they don't obviously associate with any given argument to the function. Consider a Cast type class that converts a value of type A to type B, and a function that uses it:

fn f<A, B>(x : A, y : B) where Cast<A, B> =>
    let x_as_B : B = cast(x)
    ...

(aside: we need to decide on where the type-class requirements list is going to go in function definitons).

Cast is does not provide methods on A or B but provides independent functions (all type classes do).

What we actually want to do is turn the type-class into a Record data type, turn the implementation into a value of the type of that Record, and then add an extra parameter to the function. The JavaScript would be something like:

var cast_A_B = {
    cast : function(x) { return x; } // whatever casting function
 }

function f(x, y, cast_class) {
    var x_as_B = cast_class.cast(x)
    ...
}

Note there is no representation of the type-class itself in the JavaScript, it just has the implementations.

In the future we can monomorphise and inline the typeclass function and remove the extra argument where appropriate, but that can be viewed as an optimisation.

@keean wrote:

Simply parse the source, discard the typing information and output the JavaScript.

If that was feasible, I would have created ZenScript a long time ago. Your plan is not feasible.

The compiler has to select the dictionary for the implementation of the data type which is passed in the source code, but there is no way for ZenScript to do that without also doing full type checking.

That is why I devised this hack described in the OP.

My hack transforms typeclassing into subclassing within the module, so that a subclassing type checker can check everything. And it requires no typechecking in ZenScript. And the only downside I see, is that it requires that modules don't provide conflicting implementations (which is damn easy for us to do now at this experimentive stage).

So again I ask you do you want to accept my proposal that we first prioritize transpiling to TypeScript? I am probably going to do if you won't. Afaics, there is no other fast track way.

Type-classes are not that simple however, because they don't obviously associate with any given argument to the function.

fn f<A, B>(x : A, y : B) where Cast<A, B> =>

Obviously we can't do that in my hack, but we can do higher-kinds if our transpile target supports them:

fn f<A, B>(x : A<B>) where Cast<A<B>> =>

P.S. would you please stop writing this:

fn f<A, B>(x : A, y : B)

Since (for the 5th time I will state that) I presume we've decided (since I've seen no objections) on the following.

let f(x : A, y : B)

Or:

let f = (x : A, y : B)

We unified around let rather than have an unnecessary fn along with let. And we made type parameters ALLCAPS.

@shelby wrote:

  1. Sum, Recursive, Record, and Product parametrized data type with optional Record member names and optional Record (e.g. Cons) and Product (e.g. MetaData) names:

    data List<T> = MetaData(Nil | Cons{head: T, tail: List<T>}, {meta: Meta<T>})

Thinking about how to emit that in a language with classes and tuples. We also need to get the declaration of covariant or contravariant for type parameters correct. Also were is mutability declared above?

sealed interface NilOrCons<T>   // or `abstract class`, note sealed may not be available but that is ok
object Nil implements NilOrCons<Bottom> // or `class` and instantiate a singleton instance
sealed class Cons<T> implements NilOrCons<T> {
  var head, tail
  constructor(head: T, tail: NilOrCons<T>) {
    this.head = head
    this.tail = tail
  }
}
sealed interface List<T>
sealed interface Meta<T>
sealed interface Recordmeta<T> {
  var meta
  constructor(meta: Meta<T>) {
    this.meta = meta
  }
}
sealed class MetaData<T> extends Tuple2<NilOrCons<T>, Recordmeta<T>> implements List<T>

Makes one realize the unified data syntax is much more succinct! 😎

keean commented

Nil should have the same type as Cons. The reason is we do not know how long the list is at runtime, so how do we type check:

let f<A>(list : List<A>) requires Show<A> =>
    while list /= Nil :
        show(list.head)
        list = list.tail
keean commented

Please no all caps, it's like shouting in my text editor :-(

ML prefixes type variables with a character (happens to be ' but could be anything)

Also I note you want to get rid of the type parameter list, you do realise sometimes you need to explicitly give the parameters if they cannot be inferred like:

let x = f<Int>(some_list)

This also makes me realise another ambiguity in our syntax, it is hard to tell the difference between assigning the result of a function and assigning the function itself as a value. In the above you might have to pass to the end of the line to see if there is a '=>' at the end. This requires unlimited backtracking which you said you wanted to avoid.

@keean's prior comment has a response. Please where reasonable to do so, let's try to keep the syntax discussions in their Issue thread. I started the tangent in this case. 😯

@keean wrote:

Nil should have the same type as Cons. The reason is we do not know how long the list is at runtime, so how do we type check:

let f(list : List<A>) requires Show<A> =>
    while (list != Nil)
        show(list.head)
        list = list.tail

Disagree. And nearly certain you are incorrect.

Assuming:

data List<T> = Nil | Cons{head: T, tail: List<T>}

The list != Nil is a guard that insures list is a Cons in the guarded code block, because otherwise list.head and list.tail must be a compiler error on a polymorphic type List<A>. This is yet another example where you are misapplying your Haskell mindset to a subtyping language. TypeScript already has these guards in their type checker.

But the guard doesn't change the type of list (can't change the type of a reference after it is constructed), it only specializes it where it is necessary. Thus the type of list remains List<A>.

I don't think you should attempt the type checker as our first step. We should gain experience with TypeScript's type checker first, before attempting our own. After that, everything will be clearer to both of us, so we don't commit major blunders on the type checker and lose precious time.

keean commented

So what you are saying is that List<Bottom> <: List<A> where A is any type other than bottom. However Nil cannot have a different type without dependent types as we do not know where the end of the list is until runtime.

@keean wrote:

So what you are saying is that List<Bottom> <: List<A>

Yes if by <: you mean subtypeof.

where A is any type other than bottom.

Disagree. List<Bottom> is a subtypeof of itself. Bottom being a subtypeof itself doesn't mean nor necessitate that it can be instantiated (quantified).

However Nil cannot have a different type without dependent types as we do not know where the end of the list is until runtime.

Different type from what? And what is the problem you see? I don't see any problem.

The list != Nil is a run-time test. The types match because Nil is subsumable to a List<A> and the != is defined for List<A> (actually List<_> unless we want deep comparison of all elements) not List<Bottom>. You forgot the trait bound for the != operation, although maybe we can make that implicit requires?

 let f(list : List<A>) requires Show<A>, Eq<List<_>> =>
     while (list != Nil)
         show(list.head)
         list = list.tail

Where _ means not used.

Generally though, I am thinking we will need higher-kinds.

keean commented

In the case of the above datatype Eq<List<A>> would be the correct trait bound.

In the above Nil would not be typed List<Bottom> as there is enough typing information to determine that Nil has the type List<A>, we can tell that because the arguments of /= must have the same type.

In other words we assign Nil the type List<B = Bottom> where B is a fresh unused type variable, then we know list has the type List<A> where A is the type in the function type. We then know the types of /= must be the same so we have to promote Nil to type List<B = A>

What the heck is /= in this context? I thought it was a typo and changed it to !=.

keean commented

Do you prefer !=, I have programmed in lots of languages so I know a few, !=, /=, =/=, <> etc...

For me and everybody I know who uses a mainstream programming language, /= means divide-equals.

We are targeting JavaScript. The / means divide.

@keean wrote:

In the case of the above datatype Eq<List> would be the correct trait bound.

In the above Nil would not be typed List<Bottom> as there is enough typing information to determine that Nil has the type List<A>, we can tell that because the arguments of /= must have the same type.

In other words we assign Nil the type List<B = Bottom> where B is a fresh unused type variable, then we know list has the type List<A> where A is the type in the function type. We then know the types of /= must be the same so we have to promote Nil to type List<B = A>

This is hopeless. You can stop thinking in terms of Haskell's lack of subtyping and subsumption.

Mythical Man Month effect is taking over.

Nil remains List<Bottom>. Its type can't change. It is subsumed as necessary to type check separately in each expression where it appears.

keean commented

Ada uses /= :-) It comes from the Pascal family of languages,

keean commented

@shelby3 wrote:

This is hopeless. You can stop thinking in terms of Haskell's lack of subtyping and subsumption.

I think we should avoid subtyping and subsumption, except specifically for union types where we have set-based subsumption like Int | String :> Int

Edit: Remember we agree on how bottom and top should behave, so its probably just a communication problem :-)

@keean I need to go jogging and buy prepaid load for my mobile phone. I suggest we focus on the syntax for now and that you experiment with output to TypeScript (or Ceylon) to get a feel for how these type systems work with subtyping and as we do this transform of our simplest typeclasses.

It is too much verbiage to discuss the typing now.

We can't target Scala as it doesn't have the unions.

keean commented

There's a reason I dont like typescript or Cylon :-)

Anyway above I was thinking about bidirectional type inference, which was a mistake, as I actually want things to be compositional, which would align better with what you want.

My point about not losing track of type variables, and needing more than one bottom in something like Pair<A, B> still stands though.

Finished jog. Very fast. Headed to buy load.

Please let us put typing strategies aside for now and focus on the syntax and transpiling to TypeScript. This is what I am going to do. I need a working language yesterday. I don't have time to wait for the perfect language. Sorry.

Let's focus on what we can do quickly.

The perfect language will come incrementally.

keean commented

Then we are stuck with typescripts unsound type system?

@keean wrote:

Then we are stuck with typescripts unsound type system?

I documented upthread why I thought we could avoid the aspects that are unsound. If not, we could try N4JS which the authors claim is sound.

The transformation to subclassing will lose some of the degrees-of-freedom that we get with for example multi-type parameter typeclasses, but we can start with single-parameter so at least I can start experimenting with the syntax and typeclass concept.

You may not agree with this direction because you may think it is not correct to model with a subtyping type checker. I think we will end up with a subtyping type checker eventually any way and that you will eventually realize this. If I am wrong, then I am wrong. You think you know, but I doubt either us can know for sure yet.

But what are your options? You can't transpile to PureScript because it doesn't allow impurity. So your only option is to build the entire type checker. But that will take months to get to a working stable, usable state. And there is a non-zero risk that you would realize along the way that your design was incorrect and full of corner cases issues. Then you'd start over again.

So what alternative do you propose?

We both agree that we need an imperative, typeclass language. It seems we mostly agree on the syntax. Our major conflict if any is on implementation and which strategy to prioritize.

Then we are stuck with typescripts unsound type system?

http://www.brandonbloom.name/blog/2014/01/08/unsound-and-incomplete/

Thinking about if we really need higher-kinds, because TypeScript apparently doesn't have the feature and Ceylon's feature may or may not be adequate. Both of them offer first-class unions, but TypeScript's guards may require less boilerplate (Edit: Ceylon has flow-typing aka typeof guards), we access prototype in native code, and presumably with high certainty TypeScript has more bijective JS emission since it is a minimal superset. Scala offers higher-kinds (and simulation of traits with implicits), also compilation to Scala.JS, but no first-class unions.

A Monoid is an example of an interface that needs both the type of the datatype implemented and the element type of the said datatype.

interface Monoid<A, SELF extends Monoid<A, SELF>> {
    identity(): SELF
    append(x: A): SELF
}

abstract class List<A> implements Monoid<A, List<A>> {
    identity(): List<A> { return Nil }
    append(x:A): List<A> { return new Cons(x, this) }
}
class _Nil extends List<never> {}
const Nil = new _Nil
class Cons<A> extends List<A> {
    head: A
    tail: List<A> 
    constructor(head: A, tail: List<A>) {
        super()
        this.head = head
        this.tail = tail
    } 
}

I thought the above code is higher-kinded, but I guess not because no where did I write SELF<A> and thus compiler isn't enforcing that A is a type parameter of SELF's type constructor. And it compiles and assumes that an implementation of Monoid for Nil and Cons<A> will always be same as the implementation for List<A>, which is why I added run-time exceptions to assert the invariants the compiler can't check. It even compiles if elide the types we couldn't know in order to transpile without any type checking.

Whereas, the following attempt didn't because it doesn't make that assumption.

interface Monoid<A> {
    identity(): this
    append(x: A): this
}

abstract class List<A> implements Monoid<A> {
    identity(): this { return <this>Nil }
    append(x:A): this { return <this>(new Cons(x, this)) } // Error: Type 'Cons<A>' cannot be converted to type 'this'.\nthis: this
}
class _Nil extends List<never> {}
const Nil = new _Nil
class Cons<A> extends List<A> {
    head: A
    tail: List<A> 
    constructor(head: A, tail: List<A>) {
        super()
        this.head = head
        this.tail = tail
    } 
}

And the following attempt didn't compile also because it didn't make that assumption.

interface Monoid<A> {
    identity(): this
    append(x: A): this
}

abstract class List<A> implements Monoid<A> { // Error: Class 'List<A>' incorrectly implements interface 'Monoid<A>'.\n  Types of property 'identity' are incompatible.\n    Type '() => List<A>' is not assignable to type '() => this'.\n      Type 'List<A>' is not assignable to type 'this'.
class List<A>
    identity(): List<A> { return Nil }
    append(x:A): List<A> { return new Cons(x, this) }
}
class _Nil extends List<never> {}
const Nil = new _Nil
class Cons<A> extends List<A> {
    head: A
    tail: List<A> 
    constructor(head: A, tail: List<A>) {
        super()
        this.head = head
        this.tail = tail
    } 
}
keean commented

Well both of those are having to bend a monoid to fit their objects only systems (see my post on objects or no objects).

A monoid is simply this:

pluggable Monoid<A> :
    mempty : A
    mappend(A, A) : A

Note in a non object system it has no self reference and no need for higher kinds, they are both free functions like string literals and string concatenation. So using that example:

String implements Monoid :
    mempty = ""
    mappend = +

And we can use it like this:

let x = mappend(mempty, mempty)

Note mappend and mempty are just functions not methods, just like "" and + are not methods. Infact my proposal for "no objects" would mean no methods at all, you simply create a data-type and implement the type-classes you want on it. Namespacing is handled by modules.

Regarding why your implementation did not work, try:

interface Monoid {
    identity(): this
    append(x: this): this

Regarding why your implementation did not work, try:

interface Monoid {
    identity(): this
    append(x: this): this

I tried but it won't type check and Monoid has incompatible semantics with a list.

Readers see also the related discussion in the Higher-kinds thread.

@keean wrote:

A monoid is simply this:

pluggable Monoid<A> :
    mempty : A
    mappend(A, A) : A

That type checks in TypeScript, but again Monoid has incompatible semantics with a List.

Whether this is an explicit or implicit function argument seems only to be an incidental concern. The problem we want to avoid from OOP by preferring typeclasses is the premature binding of interface to inheritance hierarchy, aka subclassing.

ML functors are similar to typeclass implementations, e.g. implementing ITERABLE for a FOLD interface. And in OCaml functors are first-class, and now even have implicit selection by the compiler in the static scope. And BuckleScript appears to be a good OCaml -> JavaScript compiler.

So just as is the case for Scala, we could I think entirely emulate our typeclasses in OCaml, but ML variants do not have first-class anonymous structural unions inference, which TypeScript, Ceylon, and N4JS have (yet they don't have typeclasses). So we'd end up with a lot of boilerplate on our language's source code (not just in the emitted transpiled code), which is antithetical to the elegance and brevity of the code I want.

I think I would rather have elegance and brevity and be limited on the typeclasses we can write by transpiling to TypeScript, then starting off without the first-class anonymous structural unions inference and have noisy code that is perhaps more generally capable in the area of typeclasses. Also I don't want to waste a lot of effort making it perfectly interact with OCaml. The goal is to get something quick, elegant, and dirty by transpiling to TypeScript to test out the "look & feel"; and then work on our own type checker which can do all the features we want, not just what shoehorning into OCaml will allow us to do. Also there is factor that I have no experience with ML variants and the syntax currently looks foreign to me.

keean commented

Here is a solution that works. I am not sure it is the best solution:

class Monoid<A> {
    identity(): A
    append(x:A, y:A): A
    constructor(
        identity: () => A, append: (x: A, y: A) => A) {
        this.identity = identity;
        this.append = append;
    }
}

abstract class List<A> {}
class _Nil extends List<never> {}
const Nil = new _Nil
class Cons<A> extends List<A> {
    head: A
    tail: List<A> 
    constructor(head: A, tail: List<A>) {
        super()
        this.head = head
        this.tail = tail
    } 
}

let int_list_monoid = new Monoid<List<Int>>(
    (): List<Int> => new Cons(head = 'I', tail = Nil),
    (x: List<Int>, y: List<Int>): List<Int> => if x == Nil {return y; } else {
        return new Cons(x.head, int_list_monoid.append(x.tail, y)))
);

let test = int_list_monoid.append(int_list_monoid.identity(), int_list_monoid.identity())
console.log(test);
keean commented

There is something weird though because the list is type List<Int> but typescript lets me put strings in.

@keean wrote:

Here is a solution that works. I am not sure it is the best solution:

Why are you repeating the solution I provided in a prior comment?

keean commented

refresh :-) it shares the original not the edited version from the playground for some reason. I have pasted the code in directly now.

@keean wrote:

it shares the original not the edited version from the playground for some reason.

To work around that Playground bug, first delete everything from the url including and following the # on the Address line of the browser. Reload new url (type Enter, don't click Refresh). Then click Share. Appears to be a cookies issue. 🍪 👹

Okay I fixed yours, and you are correct that List can implement Monoid semantics by completely rebuilding the List.

In my prior comment on this, I wasn't paying attention that Monoid does not require mutating its input, i.e. the result may (must?) be a different instance than the input. Which is probably required by its laws, but I didn't check those.

But the challenge is to make a syntax for our programming language to express that, such that we can transpile to that without a type checker?

keean commented

Okay I agree with all your changes except I don't see any advantage of splitting the Monoid class into an interface and a class.

Also typescript does not seem to catch the type error of identity Cons-ing a String "I" into what is supposed to be a numeric list? Is type-checking even turned on in that playground?

Note if use different name for class and interface Monoid, then the compiler forces to declare the members identity and append in the class also. So this is why if I remove the initialization in the constructor, there is no error, because those members are defaulted to null. TypeScript I believe has a compiler flag to remove the default population of every type with null.

@keean wrote:

Okay I agree with all your changes except I don't see any advantage of splitting the Monoid class into an interface and a class.

So that when we emit typeclass bounds on function arguments, we don't allow the function to invoke the constructor of _Monoid.

Also typescript does not seem to catch the type error of identity Cons-ing a String "I" into what is supposed to be a numeric list? Is type-checking even turned on in that playground?

This change to our example explains that it is the "bivariance" unsoundness of TypeScript, i.e. that when it has to subsume to List<A> it does not enforce covariant relationship of the type parameter A. We could experiment with N4JS if we prefer enforced soundness. Or both?

OTOH, TypeScript being so "forgiving" (completeness instead of soundness) may make it easier for us to experiment bypassing its type system. Perhaps could employ compiler generated run-time checks where we expect type system holes.

@shelby3 wrote:

But the challenge is to make a syntax for our programming language to express that, such that we can transpile to that without a type checker?

If we support pattern matching overloads for functions:

typeclass Monoid<T>
   identity: T
   append(T): T
List<A> implements Monoid
   identity = Nil
   append(y) =>
      let f(Cons(head, tail), y) => Cons(head, f(tail, y)),
          f(Nil,              y) => y
      f(this, y)

Otherwise:

typeclass Monoid<T>
   identity: T
   append(T): T
List<A> implements Monoid
   identity = Nil
   append(y) =>
      let f(x, y) => match(x)
         Cons(head, tail) => Cons(head, f(tail, y))
         Nil              => y
      f(this, y)

The second variant above seems transpilable to this TypeScript code without any type checking. If we want to get rid of that if (x instanceof _Nil) then our type checker has to know that Nil is the only other alternative for List<A>. I suppose the first variant example above only requires a syntactical transformation to the second.

Can we somehow get ES6's tail recursion to help avoid stack overflows and gain performance in the above example?

We can shorten that:

typeclass Monoid<T>
   identity: T
   append(T): T
List<A> implements Monoid
   identity = Nil
   append(y) => match(this)
      Cons(head, tail) => Cons(head, tail.append(y))
      Nil              => y

And the corresponding TypeScript code.

keean commented

The problem is 'identity' should be a constructor function, not a method of list, so on the same basis append needs two arguments like the + operator.

If you use 'this' the type of monoid needs to be:

typeclass Monoid
   static identity: This
   append(This): This

But then identity has to have a static type.

typeclass Monoid<T>
   identity: T
   append(T, T): T

Now identity has the correct type, but append has two arguments, both are now free functions not methods.

@keean wrote:

The problem is identity should be a constructor function, not a method of list

Agreed:

typeclass Monoid<T>
   static identity: T
   append(T): T
List<A> implements Monoid
   identity = Nil
   append(y) => match(this)
      Cons(head, tail) => Cons(head, tail.append(y))
      Nil              => y

We can implement it on TypeScript as a method, because our transpiler can easily check to make sure this is not accessed within implementations of static.

Btw, in case you might think I didn't know this, I had the STATIC in my mockup for my language Copute last modified in 2013. You can see I was working on this in 2012.

keean commented

Great, I will read that. The problem with identity is how do you call it without an object to call it on? If you look at my coding in typescript, I create a class for the Monoid dictionary, and then an object of that class. That object is the type-class dictionary for the List type. We use that object to hold the methods, so append needs two arguments because the object it is part of is neither of the lists being appended. Personally I find this makes things more complex because of its recursive type. I think things are simpler and easier without the this self reference.

However does getting rid of this take us too far from the mainstream? Can we sell this as a C like scripting language plus type-classes?

Edit: Java and 'C' seem equally popular so having objects is definitely not a hard requirement. However there are more languages that feature objects like Python etc.

@keean wrote:

Great, I will read that.

I understand better now of course so there were errors. But some of that laid the foundation for my understanding.

The problem with identity is how do you call it without an object to call it on?

If the use-site requests a Monoid typeclass interface in the where clause and it not the bound of an input argument, then pass an extra argument which is the dictionary for the static methods of Monoid. See static_List in my TypeScript code example.

If you look at my coding in typescript, I create a class for the Monoid dictionary, and then an object of that class. That object is the type-class dictionary for the List type. We use that object to hold the methods, so append needs two arguments because the object it is part of is neither of the lists being appended.

Yours loses genericity on A at the implementation side (so we would need a type checker to track every instantiation of List<A> and know which of your constructed instances to pass along with the instance at the use-site!), because you would need higher-kinds to code it generically. Scala can do that with higher-kinds:

trait Monad[+M[_]] {
  def unit[A](a: A): M[A]
  def bind[A, B](m: M[A])(f: A => M[B]): M[B]
}

My design is so we don't need a type checker to transpile it to an OOP language without typeclasses.


@keean wrote:

Personally I find this makes things more complex because of its recursive type. I think things are simpler and easier without the this self reference.

However does getting rid of this take us too far from the mainstream? Can we sell this as a C like scripting language plus type-classes?

From the naive programmer's perspective, they are passing an instance argument to the call site, and they expect the typeclass interface (bound) to operate on that instance, so they are probably thinking in terms of this for the instance and their familiarity with OOP. Only the advanced users of the language will learn to request static typeclass interfaces in the where clause (or will it be a requires clause?). I think the explicit annotation with static is more clear to those coming from OOP. Can you give an example of any problem this causes?

P.S. I am very sleepy (delirious), so check the above for errors. 💤 😪

Follow-up on prior comment.

@shelby3 wrote:

I don't think the most frequent case will be implementing typeclasses by pairs of data types. I think single data type implementations will be much more common, and Eq and Add will not work with pairs of types, but rather the same type for both arguments. We want to avoid implicit conversions, as that appears to be a design error in many languages such as C and JavaScript. Please correct me if you know or think otherwise.

As I had argued, it doesn't seem to make any sense to a programmer coming from OOP, that they've specified an interface bound on their function argument, and they are not suppose to call the methods using dot syntax. I think it is breaking the way mainstream (Java, C++, JavaScript, PHP, etc) people naturally think about it. I explained that static makes it explicit when this doesn't apply. I much prefer explicitness in this case, as it is not any more verbose.

I prefer to keep out typeclass concept as familiar as possible to the way people already think, especially given there are no downsides to doing so.

Additionally, by using dot notation on calls, afaik we improve parsing performance in the LL(inf) case.

Afaics, we don't always need requires in the single data type implementation case (i.e. one type parameter on the typeclasse), and can write it like this:

f(x: IShow) => x.show()

Instead of:

f(x: A) requires IShow<A> => x.show()

For two or more typeclass interfaces on the same input:

f(x: ILog+IShow) =>
   x.log()
   x.show()

For IEq though we need requires because x and y need to be the same data type:

f(x: A, y: A) requires IEq<A> => x.equals(y)
f(x: A, y: A) requires IEq<A> => y.equals(x)
f(x: A, y: A) requires IEq<A> => x === y

I presume the implicit conversions case, if we support it would be:

f(x: A, y: B) requires ICongruent<A, B> => x == y

Btw, when a zero argument method returns a value and we are not discarding the result, do we want to allow as Scala does?

f(x: ILength) => x.length

Instead of:

f(x: ILength) => x.length()

P.S. Note the I prefix for typeclasses is a proposal, but I don't know if we accepted that or not. On the one hand, it makes it clear which are typeclasses and which are data types, but OTOH it is noisy.

keean commented

I think leaving off the () on zero argument functions is not a good idea, because those functions execute. How do you pass a zero argument function as an argument to another function if you do not indicate when it is to be executed. You effectively lose control over the order in which side-effects happen.

@keean wrote:

How do you pass a zero argument function as an argument to another function if you do not indicate when it is to be executed

Based on the type it is being assigned to, although there is a corner case where the result type is the same as the type being assigned to.

I think Scala allows the length(_) in that corner case. We could require this always for passing the function instead of its result value.

On the positive side, it does help to get rid of parenthesis more times than it will cause confusion.

keean commented

I think Scala allows the length(_)

keean commented

I think Scala allows the length(_)

That's worse :-( I think function application.needs to be consistent not some backwards special case, we seem to be introducing corner cases into the grammar. Scala is not a good model to follow.

@keean wrote:

I think Scala allows the length(_)

That's worse :-( I think function application.needs to be consistent not some backwards special case, we seem to be introducing corner cases into the grammar. Scala is not a good model to follow.

Perhaps you misunderstood. I had explained to you that the way we can do partial application is using the _ for the arguments we don't want to pass.

So for zero argument functions, if we want to allow the shortform of calling them length instead of length(), then we also need some way of differentiating when we don't want to call them but want the function instance instead. So the solution is to maintain our consistent rule of partial application and specify that the non-existent first parameter is not passed length(_).

keean commented

So for zero argument functions, if we want to allow the shortform of calling them length instead of length(), then we also need some way of differentiating when we don't want to call them but want the function instance instead. So the solution is to maintain our consistent rule of partial application and specify that the non-existent first parameter is not passed length(_).

But this just creates a mess obscuring what is a value, what is a function reference. If you have a function reference you need to execute it before you can get the result. We execute a function by providing its arguments.

Its just so inconsistent, if I pass a one argument function as an argument to a higher-order function I do not put any brackets on it, just he plain variable can be passed, but for a zero argument function I have to put (_) after it just to pass it into a higher-order function.

@keean wrote:

But this just creates a mess obscuring what is a value, what is a function reference. If you have a function reference you need to execute it before you can get the result. We execute a function by providing its arguments.

I don't agree there is a necessary distinction between the two at the denotational semantics layer. You are down in the operational semantics layer. Apologies if I am misusing the academic terms. Hope you understand that I don't see how it makes any difference for our code whether something is called or not at some lower layer of semantics. If you can point out an example where it will make a difference, then please do.

Edit: note I see @keean has pointed out that values can't have side-effects.

Its just so inconsistent, if I pass a one argument function as an argument to a higher-order function I do not put any brackets on it

But is also sort of an inconsistency that “partial application” of no-arguments to a zero and one argument function would be same syntax otherwise. Perhaps pure zero argument functions should never be functions, and instead always be passed as immutable values. Or stated another way, pure zero argument functions should not be allowed.

but for a zero argument function I have to put (_) after it just to pass it into a higher-order function.

The inconsistency I see is that we've allowed partial application on an argument that does not exist, but in return we get this beauty of less proliferation of () all over the place. And we can unify immutables and pure zero argument functions.

But for non-pure zero argument functions, I think I agree with you.

@shelby3 wrote:

You are trying to find The One Pattern™ but it doesn't exist. I went down that rabbit hole with my various mockups for my stillborn Copute language project.

Sorry there isn't any one perfectly consistent universal pattern. Perfection is impossible. It is always about balancing tradeoffs.

I found a simpler implementation of a data type in TypeScript which typechecks better (Playground):

class _Nil {
    private _nominal:undefined
}
const Nil = new _Nil()
class Cons<A> {
    private _nominal:undefined
    constructor(readonly head: A, readonly tail: List<A>) { }
}
type List<A> = _Nil | Cons<A>

let x = new Cons<number|boolean>(0, new Cons(true, Nil))

The private member forces a simulated nominal typing.

Also this compiles in TypeScript:

const y = ((x: boolean): number | boolean => {
    if (x) return x
    else return 1
})(true)

Note the following generates an error "No best common type exists among return expressions", but TypeScript completes the compilation anyway and assigns the correct type number | boolean (actually number | true which seems to unify with number | boolean) to y:

const y = ((x: boolean) => {
    if (x) return x
    else return 1
})(true)

Refresh both prior comments. Where I am headed with this, is to create some regular expression transformations so I can simulate a transpiler for the basic Boost syntax I want to TypeScript (no block indenting much closer to JavaScript syntax), as an interim methodology so I can code some aspects in my preferred style immediately.

In other words, for the prior two comments I will be able to write instead respectively:

data List<A> <- Nil | Cons(A, List<A>)

var y := if (x) x else 1

Note if I instead write:

val y := if (x) x else 1

This would be converted to TypeScript:

const y = Object.freeze(((x) => {
    if (x) return x
    else return 1
})(true))

Or if I instead write:

var y = if (x) x else 1

This would be converted to TypeScript:

let y = ((x) => {
    if (x) return x
    else return 1
})(true)

Note it is apparently the very recent 2.0 compiler upgrade for TypeScript that made the above possible where I can do a syntactical transpilation with absolutely no AST.

@keean I told you before that I needed to find shortcuts. You are trying to go directly to producing a compiler with unification, which I estimated would take many months and I am correct. I am happy we discussed all the design issues, but now I need to return to my original priority set.

Edit: my alternative idea for syntax is:

data Nil | Cons(A, List<A>) -> List<A>

Which I think I like better. Makes it more clear.

Note afaics constructors can not pure functions because they create a new object every time they are called so we don't need to write:

data Nil | Cons(A, List<A>) :-> List<A>

@shelby3 wrote in #11:

@shelby3 wrote:

I would like to have the perfect language now, but unfortunately priorities beckon

It is notable how none of the existing languages can satisfy my interim needs well.

I took another look at haXe last night and immediately realized it is not even close to what I want, even with the macros facility. For example, it offers no support of unions and I can't even do some of things I want to do that I can do in ECMAScript such as async procedures pattern employing generators.

I contemplated Scala because at least I can do rudimentary typeclasses employing implicits, but again no unions, would need to use Akka for actor model to get promises (and presumably async procedures) but without compatibility with JavaScript ecosystem (and debugging in the browser being a dissonant). Also I would have to use the heavyweight IDE compilation ecosystem of Java, e.g. Eclipse. Ughh.

I thought for a moment about using C++ and Emscripten for transpilation to JavaScript, then slapped myself.

So then the only sane decision was to try to add some basic improvements to JavaScript with the least pain.

By transpiling to TypeScript, I get unions, guards (I can even perhaps syntactically transpile a match), and very good interoption with ECMAScript, but no typeclass although I can similate them to a limited extent with interfaces and some manual injection of the dictionary (either a function which wraps delegation or another trick of having the interface input an argument for self instead of using this).

P.S. please read my private message on LinkedIn, because I think you misunderstood what I meant when I created this issue thread. I didn't mean a half-baked partial language that we can experiment with its limited features such as what you have now in Compiler Implementation. I meant I need a complete language immediately, so I want shortcuts of transpiling to TypeScript interim so we can get some benefits immediately for production code.

keean commented

@shelby3 I would recommend looking at something like sweet.js to see if it can do what you need.

@shelby3 I would recommend looking at something like sweet.js to see if it can do what you need.

Thanks for the suggestion. I believe I had seen that before. I analyzed it again, and unfortunately it doesn't support TypeScript types, and the support for non-JavaScript symbols (e.g. ->) appears to not be ready yet.

I think it will also be impossible to achieve with regular expressions, because I need recursion of productions.

So think the only way forward would be to complete the grammar for our language, then write a parser. Use the parser to output the equivalent TypeScript, while avoiding using any features that would require unification and type checking. There doesn't seem to be any shorter cut than that.

@shelby3 wrote:

So think the only way forward would be to complete the grammar for our language, then write a parser. Use the parser to output the equivalent TypeScript, while avoiding using any features that would require unification and type checking. There doesn't seem to be any shorter cut than that.

Been sidetracked past week on other work. Am still going to attempt to come back to this and attempt the final grammar parser (lest I forget everything I needed to change in the grammar if I stay away from this too long).

I have not abandoned the goals we set. Just trying to figure out how to fit into my priorities.

I am not eager to code directly in TypeScript. I'd rather have a transpiler I can tweak even without the typing to start with.

@shelby3 wrote:

I found a simpler implementation of a data type in TypeScript which typechecks better (Playground):

class _Nil {
    private _nominal:undefined
}
const Nil = new _Nil()
class Cons<A> {
    private _nominal:undefined
    constructor(readonly head: A, readonly tail: List<A>) { }
}
type List<A> = _Nil | Cons<A>

let x = new Cons<number|boolean>(0, new Cons(true, Nil))

The private member forces a simulated nominal typing.

The primary distinctive attribute of a type alias is that it is not a first-class nominal type, i.e. it is just name that is replaced syntactically and thus never appears in the AST.

I think I prefer the simplification that sum (aka union) and product (aka intersection, but not the same as record) types should never be first-class nominal types, because otherwise (compositionality is limited, i.e.) these nominal unions and intersections can not share their constituent member types. Here is an example of why it creates ambiguity otherwise:

data Thing = A | B
data Thing2 = B | C
foo(arg: Thing | Thing2) …

Now if you pass an input of type B to function foo then what is the type of the arg? Is it a Thing or Thing2? If we argue that the type is just B (and thus that the type of arg is really just A | B | C), then there is no reason to have Thing and Thing2 be nominal types.

One ramification of this preference is that our syntax for data can become much simpler. We no longer need to duplicate the type name of the left and right-hand-side of an = because we not longer need = in the syntax, e.g.:

data MyData(a: A, b: B, c: C)

Bear in mind the point I made recently about constructor functions for data. Also my point to not “unify” conflate everything as you had noted was conceptually plausible.

(P.S. I posted this in this thread because you may have noted recently that I am really trying to align my design decisions to be compatible to transpilation to TypeScript, but only where it is what I would want anyway long-term for the language.)

keean commented

Or you can see it the other way, as the most common constructor function is to populate the data directly, then it is shorter to write

data Either[A, B] = Left(A) | Right(B)

Which nicely declares the parametric type, and both the constructor functions in one line.

I agree about 'type aliases' they are just a shorthand way of writing the full type and have no meaning other than that. They can get expanded out to the underlying types.

@keean that only works unambiguously because you never allow Left to be a constructor for a different data type. This means you can not reuse and have general compositionality of constructors without creating type-system ambiguity.

It means you can not normalize (flatten) unions and intersections. For example anonymous unions and intersections, as well Either can not be recursive without wrapping it in a Left or Right which becomes boilerplate hell.

I really do not like the wrapping of FCP (which we ended up determining was basically unnecessary except for global inference) nor the wrapping proposed here. But if you can show me some compelling use case, I am open-minded.

I am positing I prefer structural typing of unions and intersections and not nominal. However, I might not be aware of all the ramifications?

keean commented

This means you can not reuse and have general compositionality of constructors without creating type-system ambiguity.

Of course you can, datatypes are not structural, you are getting confused with the use of | in a datatype to mean disjoint-union and an anonymous union.

There is no FCP going on here, these data declarations are straightforward encodings of structs and enums, with their constructors. The main difference is that these enums can have properties, unlike the C/C++ ones that are simply integers.

This leads to very compact representations like:

data BinaryTree[A] = Branch(BinaryTree[A], BinaryTree[A]) | Leaf(A)

data NaryTree[A] = Branch(List[BinaryTree[A]]) | Leaf(A)

which effectively encode subtyping hierarchies (without the subtyping) like:

class BinaryTree[A];

class Branch[A] : BinaryTree[A] {
   Left : BinaryTree[A]
   Right : BinaryTree[A]
}

class Leaf[A] : BinaryTree[A] {
   Value : A
}

Note its not quite the same because Branch[A] and Leaf[A] are separate types in the class encoding and it needs subtyping, whereas in the data encoding, all the values have type BinaryTree[A] so maybe its a bit more like:

class BinaryTree[A] {
   enum {
      Branch,
      Leaf
   }
   union {
      Branch : struct {
         Left : BinaryTree[A]
         Right : BinaryTree[A]
      },
      Leaf : A
   }
}

However in this case there is nothing that really ties the enum options to the union alternatives. Really the object model doesn't quite have something that matches the semantics, where we get an enum to mark Branch or Leaf, and the 'union' data has to match the enum tag.

There is no FCP going on here

Where did I write that FCP was going on here? I mentioned FCP because of its wrapping boilerplate which it shares in common with Either. You even admitted that FCP requires wrapping and unwrapping boilerplate.

I will quote myself for you:

I really do not like the wrapping of FCP (which we ended up determining was basically unnecessary except for global inference) nor the wrapping proposed here.

@keean did you not notice that I linked to the Tree example, so why are you repeating an example I already linked to?

I will quote myself for you:

I think I prefer the simplification that sum (aka union)

@keean wrote:

Of course you can, datatypes are not structural, you are getting confused with the use of | in a datatype to mean disjoint-union and an anonymous union.

I am not confused. And you have not written anything which addresses nor refutes what I wrote.

You are stating that you prefer wrapping and nominal types for unions, but you have not responded to my criticisms.

We are trying to be respectful to each other right. The term confused is very borderline close to ad hominem.

keean commented

I think I prefer the simplification that sum (aka union) …

That page describes exactly what I have been talking about, so I don't know how you 'prefer the simplification'?

I think you misunderstand and assume 'Cons' is a type when it is in fact a data-constructor as I have explained. This is important:

A sum type is not a union.

A sum type is a 'dijoint union' or a 'tagged union' (all the same thing, different names). This is different from a plain 'union' type which lacks the tag.

We can look at the memory. Lets look at a List, with Cons and Nil. Lets encode Cons as 1, and Nil as 0. Lets have an integer list. The encoding of [1,2,3] could be something like (bytes in memory):

1, 1, 1, 2, 1, 3, 0

If Cons and Nil were types, then you would expect to see the following in memory:

1, 2, 3

Note you cannot see the Nil as it is a type, so the type system knows the length of the list but there are no runtime markers in memory.

Another example might be a list List[Either[Int, String]], where we encode Left as 0 and Right as 1, we would then encode [1, 'a', 2, 'b'] as

0, 1, 1, 'a', 0, 1, 1, 'b'

However a list of a union like List[Int / String] would look like this in memory:

1, 'a', 2, 'b'

So there is no tag to allow the runtime to distinguish between the types.

I think part of the problem is the 'boxing' confuses things, because in dynamically typed languages that have boxed types we can have a list of a union of boxed type like this:

[Int 1], [String 'a'], [Int 2], [String 'b']

So with boxed types you don't really need the tag, because each type has its own runtime tag in the box.

We might say that Sum types allow pervasive unboxing, and introducing unions types requires boxing... I think our misunderstanding comes from me thinking about unboxed types by default, and you thinking about boxed types by default.

Note: I am not saying I don't want boxed types, just that I would mark them explicitly in the type system, so that you can have both boxed and unboxed types in the same type system.

A sum type is not a union.

A sum type is a disjoint (tagged) union.

Given the types in an anonymous union will also be tagged, I see no distinction between them other than whether the type is first-class nominal or first-class structural.

And I explained the trade-offs of the nominal approach (wrapping, lack of ability to compose constructors in unions arbitrarily).

I think you misunderstand and assume Cons is a type when it is in fact a data-constructor as I have explained.

I do not know why you think that about me or what I wrote. I clearly used the word ‘constructor’ and referred to data.

refresh

I noticed after I wrote my response above, that you added the following to your prior post:

A sum type is a 'dijoint union' or a 'tagged union' (all the same thing, different names). This is different from a plain 'union' type which lacks the tag.

keean commented

I noticed after I wrote my response above, that you added the following to your prior post:

I added it before reading your post.

And I explained the trade-offs of the nominal approach (wrapping, lack of ability to compose constructors in unions arbitrarily).

But your approach requires all types to be boxed, which uses a lot more memory and is a lot slower.

My approach is to have | for disjoint union and \/ for union, where the latter requires boxed type arguments, and the former unboxed.

keean commented

refresh

But your approach requires all types to be boxed, which uses a lot more memory and is a lot slower.

Okay that is now a salient response and good point. I will think about it while I eat.

But not all types, only union types, and not boxed but tagged (which is an orthogonal concept).

Please try to avoid implications that I am dumb. Neither of us like it when the other does that, even accidentally. I understand we both get perplexed at times what is the significance of what each other means. So I do understand.

I will quote myself again:

However, I might not be aware of all the ramifications?

keean commented

Please try to avoid implications that I am dumb.

I don't think I have implied you are dumb. That has certainly never been my intention. Misunderstanding does not imply dumb. It implies that either preconceptions are preventing the person from listening/understanding, or my communication was not clear enough.

Meta: There is no need to state that a person is confused (if they insist and are clearly trolling then perhaps just put on Ignore if they persist too much). If we provide our reasoning, then each other can decide on their own accord if they were confused because a reasoning is correct. Presuming that the other person is something or another implies judgement of the other person. We should just explain our technical thoughts. Note I have committed personal judgement too many times, but I am trying to reform myself. As I wrote, it was only borderline and possibly accidental, sometimes is a quick way of expressing ourselves when we could find better words if we slow down. I think both of us are often rushing. And we both know not to force our communication when we are sleepy or famished. We should not rush. Apologies for calling you out on this as that is also a form of judging (I should have brought the Meta to private communication, my mistake). Ah Internet communications. Hopefully we will coalesce well now.

However a list of a union like List[Int \/ String] would look like this in memory:

1, 'a', 2, 'b'

So there is no tag to allow the runtime to distinguish between the types.

You know that a string is not typically placed unboxed in memory like that, because it has a variable length.

Also we had discussed in the past (it was your point afair) that boxing is orthogonal to tagging, i.e. the presence or not of boxing, does not dictate the presence of tagging or not nor vice versa. Perhaps unboxing typically coincides with untagged in many languages?

So I presume you are just making a conceptual point that a union can be untagged but a Sum (aka tagged union) type is never untagged?

We might say that Sum types allow pervasive unboxing, and introducing unions types requires boxing...

A sum type is always tagged, but I do not see how that implies it is always unboxed. Appears the two are orthogonal concepts.

Static (compile-time) union types do not absolutely require boxing because the types in the union are bounded statically (whereas typeclass objects do because they are dynamically polymorphic and unbounded). Whether they require tagging depends on whether the language wants them to be reified or not, which afaics impacts whether the instances of the union type can be type-case logic filtered. I really do not see much utility for untagged unions, do you?

Again I do not see any boxing nor tagging distinction between structural versus nominal unions. I just see the nominal approach has disadvantages that I first enumerated, which afaics I have not yet seen a refutation of. You know that List never appears in memory in the language you want (i.e. it is not reified nor a supertype/superclass), only the data constructors (e.g. Cons and Nil) are tagged. Thus the nominal aspect has really nothing to do with the tagging.

Edit: to be a bit more explicit, let us note for readers that Left and Right wrap the actual types we want in the union. If unwrapped, then we do not need all that boilerplate and can collapse nested Either into a flat optimal tagged union of the unwrapped types. That was my original thinking on this matter which causes me to favor structural and not nominal.

What if instanceof could be used when translating to a JS backend to allow access to the "tag" already held by JS, assuming the types involved are able to be differentiated using some combination of these calls? Since ZenScript has typeclasses, would such artificial tagging be able to be bolstered by a typeclass for differentiating otherwise undifferentiable user types?

What if instanceof could be used when translating to a JS backend

That is what I am intending to employ, by transpiling to TypeScript which has instanceof guards (to achieve JS).

keean commented

A sum type is always tagged, but I do not see how that implies it is always unboxed. Appears the two are orthogonal concepts.

Sum types can also have multiple options with the same type, but different tags.

The difference between 'tags' and 'boxed types' is that a boxed type is a type, but it may not be visible at compile time.

So the advantage of Sum types is to make 'type' something that is always statically known at compile time, as opposed to boxed types, that are existentials (and they have to be existential because the static type system cannot know what type is in the box, it is only knowable at runtime).

I prefer to keep things simple and consistent, and say a type is something that is statically known at compile time, and a tag is something only known at runtime.

So if a tag is not a type, we need some way so specify what the runtime tags are, and that is what the constructor names in the data declarations are.

What an anonymous union is, can be represented as a sum type, where we infer the constructor names from the types of the constructor arguments.

Edit: to be a bit more explicit, let us note for readers that Left and Right wrap the actual types we want in the union. If unwrapped, then we do not need all that boilerplate and can collapse nested Either into a flat optimal tagged union of the unwrapped types. That was my original thinking on this matter which causes me to favor structural and not nominal.

Without 'Left' and 'Right' there is no way to tell what type is actually in the union at runtime. Your suggestion ammounts to auto-generating the 'Left' and 'Right' tags based on the types of the objects stored, so the runtime perfromance will be exactly the same.

What you lose by using the type names as the tags is that it makes it harder to refactor (changing the type stored would change the tag), and you cannot have more than one option that is of the same type.

Sum types can also have multiple options with the same type, but different tags.

Ditto an anonymous union is also a static type with a static number of multiple options. The only difference is whether the type is nominal. Which is the point I have been making.

The difference between 'tags' and 'boxed types' is that a boxed type is a type

Tags are the runtime reification of types. Tags are required for boxed types which need reification, e.g. type-case dispatch of code branches. Virtual methods are an alternative mechanism for dynamic dispatch.

I know you know all this. I am just presenting a different way to think about tags as not a special case of subordinate to type …

So the advantage of Sum types is to make 'type' something that is always statically known at compile time, as opposed to boxed types, that are existentials (and they have to be existential because the static type system cannot know what type is in the box, it is only knowable at runtime).

What you apparently mean is that the choice of multiple options is statically known for a Sum type (e.g. Left or Right for Either, Cons or Nil for List), but a typeclass object (or superclass in subclassing languages) the range of options for types (or subtypes) is unbounded.

Note that the multiple options for anonymous structural unions are also known (bounded) at compile-time.

I prefer to keep things simple and consistent, and say a type is something that is statically known at compile time, and a tag is something only known at runtime.

Your intent must be that the bound of types (which are tagged) is known at compile-time (aka statically known) but you know that the actual choice is not known at compile-time. A tag is known at compile-time, as it corresponds to a type known at compile-time. It is the bound of types in the union which is either known at compile-time or not.

So if a tag is not a type, we need some way so specify what the runtime tags are, and that is what the constructor names in the data declarations are.

The constructor names are types. In your nominal model, you want those constructors (e.g. Left and Right) to be subordinate to the nominal type (e.g. Either) and thus you think of them as being only tags. But rather I think the more general way to think about it, is that all types can be tagged (reified) and that constructors are also types. Which is precisely my point that with my proposal then all constructors are first-class types and can participate any where types do, including in any unions or intersections (even being reused in more than one of them which can't be generally done in your model).

Perhaps you could make an argument that wrapping enables types to normally not be tagged. But I think we want all copy-by-reference objects to be tagged by default. Really no reason not to. For integers and primitive types which are normally copy-by-value, we may want the option for those to be untagged normally.

Without Left and Right there is no way to tell what type is actually in the union at runtime.

The Left and Right are unnecessary boilerplate wrappers that obstruct the flattening of a union removing duplicates, when all types can be tagged (either they are always tagged, or they can be lifted to a tagged type when needed, e.g. lifting true to a Boolean in JS).

so the runtime perfromance will be exactly the same

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

What you lose by using the type names as the tags is that it makes it harder to refactor (changing the type stored would change the tag)

I do not like implicit changes hidden from use sites of version control remember. Readability (not obscurity) is one of the highest priorities. Why is someone changing the name of a type that is already widely used in the source code. That is not stable code.

Type-case logic on those types whose names are changed would have to changed any way. So the refactoring argument may not be so valid.

, and you cannot have more than one option that is of the same type.

That is the point. Unions and intersections should never have more than one of the same type.

If you want to wrap a type, you create a constructor for that. In my proposal, no need to force wrapping until those rare cases you need it.

@shelby3 wrote:

But I think we want all copy-by-reference objects to be tagged by default. Really no reason not to.

On further thought, we do not really have any need for them to be tagged by default unless there will be an Any type.

Making them untagged by default enables optimizations such as storing the tag (as an index) in the unused upper bits of the 64-bit pointer to the boxed type (as Swift does on mobile) when the possible options are statically known.

But unless there is an Any, then tagged by default or lifted when assigned to a union type, is an implementation detail that does not need to be exposed to the programmer.

keean commented

Ditto an anonymous union is also a static type with a static number of multiple options. The only difference is whether the type is nominal. Which is the point I have been making.

An anonymous union cannot have multiple distinguished values with different types for example:

data Result[A] = Error(String) | Value(A)

Cannot be expressed as an anonymous union, in the case that the result is a String.

Tags are the runtime reification of types. Tags are required for boxed types which need reification, e.g. type-case dispatch of code branches. Virtual methods are an alternative mechanism for dynamic dispatch.

Not in the above, the tags are values. Both Error("MyError") and Value(32) would both have the type Result[Int] if returned from the same function. Hence a function which could return them has the type f : () -> Result[Int]. Error() and Value() are not types, they are data constructors (hence the round braces not square). There is no subtyping here, and that is important.

To make things work the way you want, both Value and Error would need to be types, and you would not have a type called Result[] instead you would declare two types like this:

data Error = Error(String)
data Value[A] = Value(A)
type Result[A] = Error | Value[A]

Note Result is now just a type alias not a type, and we have the slightly weird construction where the parameter A is only used on one side of the anonymous union. The function would now have the type f : () -> Error | Value[Int].

The constructor names are types. In your nominal model, you want those constructors (e.g. Left and Right) to be subordinate to the nominal type (e.g. Either) and thus you think of them as being only tags. But rather I think the more general way to think about it, is that all types can be tagged (reified) and that constructors are also types. Which is precisely my point that with my proposal then all constructors are first-class types and can participate any where types do, including in any unions or intersections (even being reused in more than one of them which can't be generally done in your model).

As explained above, this is wrong for Sum types, the constructor names are not types, they are values. They have a type. For example from above:

Error : String -> Result[A]
Value : A -> Result[A]

They are no different to a normal function like 'f', and according to our agreed nomenclature they probably should not have a capital first letter.

The Left and Right are unnecessary boilerplate wrappers that obstruct the flattening of a union removing duplicates, when all types can be tagged (either they are always tagged, or they can be lifted to a tagged type when needed, e.g. lifting true to a Boolean in JS).

No because they hold semantic value, for example you do not want to flatten a binary tree:

data BinaryTree[A] = branch(BinaryTree[A], BinaryTree[A]) | leaf(A)

Flattening this would result in an infinite recursion in the type checker. You could resolve that by introducing recursive types ("mu" types), but generally recursive types are considered bad.

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

No because the types need to be boxed (which is wrapping and tagging) if you don't do this.

I do not like implicit changes hidden from use sites of version control remember. Readability (not obscurity) is one of the highest priorities. Why is someone changing the name of a type that is already widely used in the source code. That is not stable code.

If you want to wrap a type, you create a constructor for that. In my proposal, no need to force wrapping until those rare cases you need it.

The problem is the type names have no semantic meaning, for example:

data Person = name(String) | age(Int)

Using name and age in the case statements of the program makes it much more readable and understandable than using String and Int. I think this is an important difference. Just using type names is easy for writing (less keystrokes) but make reading much harder. I want readable code.

Regarding the Any type, which is also known as Omega in type system, I am not a fan. I would only include an Any type as a last resort if needed for soundness, but I normally take it as a sign the type system has a bad design if it needs an omega type.

@keean wrote:

An anonymous union cannot have multiple distinguished values with different types for example:

data Result[A] = Error(String) | Value(A)

Cannot be expressed as an anonymous union, in the case that the result is a String.

I will disagree with your assumption by showing how to express it in pseudo-code:

data Error(str: String)
data Value[A](value: A)
type Result[A] = Error | Value[A]

Let me repeat, that you apparently have a perspective that prefers forcing nominal (i.e. by name) wrapping always and making the name a first-class type in the AST and making the tags a special case and subordinate to type. I prefer a unified system where all values have a type and all types can be tagged. For example in the following String is not wrapped but it is still tagged as a String:

type Result[A] = String | Value[A]

Afaics, it appears as though mine has more degrees-of-freedom than your preference, and all I lose is the unnecessary nominal type names in the AST (which I replace with aliases), which I posit is actually not a good feature to have for the reasons I already stated.

Note @keean and I discussed in private that tagging always takes place within a union if we require being able to destruct the options with type-case logic and I agreed. But my point is that we do not have double tagging due to forced wrapping (in @keean’s preference) in the case of JavaScript with my idea (JavaScript requires all types are tagged due to the presence of Any thus lack of 100% static, compile-time typing). Wrapping is an orthogonal concept to tagging. Wrapping means what we write in the source code, and afaics so far, my proposal appears to have more degrees-of-freedom because wrapping is not forced, thus our compiler can choose different strategies for different output targets (e.g. JavaScript versus for example C++).

In other words, if we write type Result[A] = String | Value[A] instead of type Result[A] = Error | Value[A], then our compiler can avoid forced wrapping and thus double tagging in JavaScript, while in some other language target our compiler might for efficiency reasons choose to not tag String except when it is placed in a tagged union.

As explained above, this is wrong for Sum types, the constructor names are not types, they are values.

Please do not accuse me of being wrong. Please state your technological opinion without any presumptions about your omniscience and my lack thereof. The same admonition applies to myself. We must respect each other as both worthy of valid insights, else it becomes very difficult to work together. I respect your knowledge. Once again, we both do not want this Meta stuff in public. We both need to reform ourselves. We both seem to have a problem with phrasing our discussions in a way that allows for our own myopia. Neither of us are perfect and omniscient. That is why we are discussing, to benefit from each others’ viewpoints.

Your perspective of what a Sum type presumably comes from your understanding from academia and programming language design research and discussion (e.g. Lambda-the-Ultimate). I am presenting my perspective on what a disjoint tagged union is. That does not guarantee I am wrong. We should present our technological ideas without accusations.

I have not redefined what I Sum type is. It is always a tagged union (100s of Google results say so). I think your authority on what a Sum type is, assumes an untagged union is distinct from a tagged union and thus the separation. But I am saying that I do not see any utility for an untagged union (for example, even if we add my solution for the Expression Problem where the compiler implicitly supplies type-case dispatch from a union to the typeclass bounds at the call site, still we need tagging because even a static union is bounded dynamic polymorphism). And I do not see the advantage of forcing wrapping as it presumes that all types will be not tagged by default, which is not the case for JavaScript. It is forces unnecessary wrapping and inability to remove duplicate underlying types from hierarchy of wrappings. It is a lot of boilerplate conflation of concepts. I do not know why your authority designed it that way.

There is no reason I can see that the options of a Sum type can not be first-class types. Rather there is a design choice between a nominal type wrapping system or a structural flattened design. I am contemplating that I prefer the later, because I enumerated the significant advantages I posit.

They have a type. For example from above:

Error : String -> Result[A]
Value : A -> Result[A]

You choose to make Result[A] a first-class nominal type. I choose not to. It is a design choice. I am opening my mind to all axis of the design choices that are available, not just to the choices made in the past by some others.

My choice also has analogous function types, but afaics I do not early bind the wrapping types Error and Value which afaics is analogous to the anti-pattern of subclassing which early binds interfaces to data:

anonymous function : String -> (Error(String) | Value[A])
anonymous function : A -> (Error(String) | Value[A])

P.S. I like alias as a more apt keyword than type.

keean commented

You are of course free to choose to use structural and union types, but you cannot redefine words. Your definition of a 'sum' type is wrong.

How about I choose to call a type a value from my perspective, and a union an intersection, etc. How can we have any hope of communicating.

@keean wrote:

Flattening this would result in an infinite recursion in the type checker.

I disgree:

data Leaf[A](A)
data Branch[A](BinaryTree[A])
type BinaryTree[A] = Branch[A] | Leaf[A]

The type checker does not need to recurse Branch[A] | Leaf[A], because Branch is a concrete type.

You could resolve that by introducing recursive types ("mu" types), but generally recursive types are considered bad.

Well if the above caused some problem, you could wrap the BinaryTree in a type:

data Leaf[A](A)
data Branch[A](BinaryTree[A])
data BinaryTree[A](Branch[A] | Leaf(A))

So we can get nominal names when we want them by wrapping.

but you cannot redefine words

Do you realize that some people at Bitcointalk cite you as an expert and use comments like that from you to claim I am not worthy.

Your definition of a 'sum' type is wrong.

After all my pleadings for us to civil, you go off the deep end. This is why we end up with an acrimonious relationship. Sad. I was really trying.

A sum type is a tagged union. Please do not overspecialize the definition to one design axis of a how a tagged union can be implemented.

A google search will return 100s of definitions of Sum type which say it is a disjoint tagged union.

Afaics, I have said nothing which makes Sum type anything other than a disjoint tagged union.

I have a more separation-of-concerns conceptualization of not conflating wrapping with tagging with boxing. They are all disjoint concepts.

If we can not be civil, I will move on. I appreciate all of your help. You have a lot of expertise. Thanks for sharing.

I know you catch a lot of mistakes I make, so it is with grief that I would move on. Believe me, I really appreciate your expertise. I have made a decision to not participate in any venues where I am not able to have amicable discourse because the Meta flame wars are too wasteful.

I am not even claiming I am correct on this issue. I just wanted to discuss it. I could careless who is correct, as long as I get the design correct. But your presumptions of my incorrectness is ad hominem and also it is noisy and interfering with us from reaching a technical understanding. We must always try to understand the technical points of each other.

How about I choose to call a type a value from my perspective, and a union an intersection, etc. How can we have any hope of communicating.

I do not have the gridlock (nor expertise) in my brain of being bound to adopted presumptions from white papers and university. If a sum type is defined to be a disjoint tagged union, then it is a disjoint tagged union. If I have a proposal which is disjoint and tagged, then I am congruent with the definition.

If your authority presumes that only values are tagged and never types, then a Sum type can not apply to disjoint tagged types, then whose lack of generality is that.

I began the discussion by stating that I was looking to create more generality of compositionality.

It was not done to trick you, or to fool you. It was so you could tell me what the mainstream conceptualizations are and I could explain how my idea was different.

Not to accuse me of being wrong because the mainstream decided that disjoint tagged union means tags are never for types and forced wrapping should be conflated with it.

It is all about having a very open mind that can look at the design space with a very fresh perspective that is not burdened by groupthink. It does not mean my idea is superior. That is why I am discussing it to try to brainstorm on the ramifications.

Btw if we want to cite authority, sums (co-products) and products afaik originate from the following 1989 reference which I cited for my Copute language research in 2010:

@shelby3 wrote:

Declarative Continuations and Categorical Duality, Filinski, sections 2.2.1 Products and coproducts, 2.2.2 Terminal and initial objects, 2.5.2 CBV with lazy products, and 2.5.3 CBN with eager coproducts

Apparently sum type is a more generalized co-product concept than what you are claiming is canonical.

Programming language design is still a wide-open field and I do not think we can even say that values are not types, as it depends on the chosen static model (perspective).

@keean wrote:

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

No because the types need to be boxed (which is wrapping and tagging) if you don't do this.

I had already explained to not conflate boxing with wrapping nor tagging. Afaics, they are all disjoint concepts.

The tagging of a type does not force it to be boxed, rather it is the only the unbounded multiple options (of for example typeclass objects) that will force boxing (although a compiler might choose to employ boxing for static number of options of a disjoint tagged union but it is not forced to, because the memory footprint of the union is known at compile-time). I am repeating what I already wrote.

That is where boxing is defined to be a reference to the value instead of placing the value directly on (in the RAM for) the stack or data structure. If you are employing a different notion of boxing, please let me know. I presume we agree that tagging is distinct from boxing. Tagging is marking (reifying) the data so its type/name is known at runtime.

And the lack of forced wrapping does not force tagging every where the type is used outside of the context of a union or intersection. I had already explained (alluded to) all of this.

@keean wrote:

The problem is the type names have no semantic meaning, for example:

data Person = name(String) | age(Int)

Using name and age in the case statements of the program makes it much more readable and understandable than using String and Int.

  1. In my proposal you can still first-class wrap if you want (and the compiler can be smart enough not to automatically not tag twice if the JavaScript is not the output target). And in my mine you can even reuse those wrapping types in any union and intersection and any where a type can be used (i.e. they are first-class which simplifies IDEs because only one unified concept, etc).

  2. We are favoring typeclasses instead of type-case logic (the later is not extensible without refactoring), so your point is for the inferior coding case.

keean commented

I don't want to waste time disagreeing with you about what a sum type is. If we cannot agree consistent terminology we will never get anywhere.

Can we just agree to use the terminology as used by the rest of the world, and as it is used in text books and taught on computer science courses, so that we can get on with the discussion, and so that other readers can understand what we are talking about.

keean commented

The sum type is really the dual of the record type. In a record we have different properties identified by a label (which is a value) .

data Test = MyRecord {
    name : String
    age : Int
}

Here properties are accessed by label, not by type. Now you could argue that wrapping the properties with labels is unnecessary and you could use types instead. This would then become a type-indexed product type.

Likewise a sum type provides a way to have alternative contents accessed by label so given:

data Test = name(String) | age(Int)

Name and age are labels, serving the same purpose as the labels in the record.

@keean wrote:

I don't want to waste time disagreeing with you about what a sum type is.

Then why did you?

You obviously do not agree with me that a sum type is a disjoint tagged union, even though every definition I can find says it is a disjoint tagged union.

I suppose 100s of sources online are all wrong and @keean is correct.

keean commented

No I agree with you that a sum type is a disjoint tagged union. It is your definition of what a disjoint tagged union is that does not match the accepted textbook definition.

@keean wrote:

It is your definition of what a disjoint tagged union is that does not match the accepted textbook definition.

Canonical sources seem to disagree with you.

Wikipedia says:

In computer science, a tagged union, also called a variant, variant record, discriminated union, disjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed, types.

In type theory, a tagged union is called a sum type. Sum types are the dual of product types.

Note above the word “types” is not values.

Dual means co-product as I wrote in a prior post citing Filinski (1989).

keean commented

This is correct if does "hold a value that could take on several different, but fixed, types". For example:

data Test = name(String) | age(Int)

Note that the type Test can hold a value of type String or Int. Which is exactly what the quotation you posted says. However the tags name and age are values as in:

case x of
    name(x) -> print x
    age(x) -> print x + 1

The case operates on the tag values.

@keean wrote:

Note that the type Test can hold a value of type String or Int.

Yes. So the wrappers are entirely unnecessary to meet the requirements of definition.

You can try to explain how my proposal did not match the definition and are not co-products (dual of product types) per the definition and per the 1989 original Filinski source.

However the tags name and age are values as in:

case x of
    name(x) -> print x
    age(x) -> print x + 1

The case operates on the tag values.

I have already addressed that. And it has nothing to do with the definition of tagged unions, nor more saliently, the concept of a co-product which is the underlying fundamental concept.

Just because you are accustomed to wrapped sum types, does not mean it has anything to do with what a co-product is. Rather the wrapping is an implementation detail and choice. I explained that with typeclasses, we disfavor type-case deconstruction.

keean commented

Because they are 'tagged unions' they have a tag. A tag is a value like 'name' or 'age', it's in the name.

If you want to use a type to index them, you are talking about a type-indexed-coproduct (TIC).

Indexing things with types is not common, because it requires an extension from simple types in the lambda cube (to types that depend on other types). Most type systems do not provide the type level functions and maps necessary to have types that depend on types.

Using the type as the tag is not an untagged union.

You are just arguing about nonsense.

You sound like a my high school Trigonometry teacher who failed my perfect scores because I did not draw a box around the answers.

keean commented

Using a type as a tag is a type-indexed-coproduct. A tag specifically has to be a value to keep everything in the 'simple types' corner of the lambda cube.

I don't understand how you can have any discussions with other computer scientists if you don't keep to agreed definitions.

keean commented

One more try:

In type theory, a tagged union is called a sum type. Sum types are the dual of product types. Notations vary, but usually the sum type A+B comes with two introduction forms inj1 : A -> A+B and inj2 : B -> A+B. The elimination form is case analysis, known as pattern matching in ML-style programming languages: if e has type A+B and e1 and e2 have type T under the assumptions x:A and y:B respectively, then the term case e of x => e1 | y => e2 has type T . The sum type corresponds to intuitionistic logical disjunction under the Curry–Howard correspondence.

So note in the formal definition e, x and y are values, and there is no possibility they can be types.