keean/zenscript

Language Discussion

Opened this issue · 116 comments

keean commented

The idea is to create a simple, compact, symmetrical, typed language for generic programming that compiles to JavaScript. The initial reference is "Elements of Programming" by Alexander Stepanov and Paul McJones.

The core concept is to have parametric types, type-classes and type-families, along with maybe higher-ranked-types, higher-kinded-types, to enable type-directed generic programming.

Inspiration is from C, C++ Concepts (type safe templates), Haskell, Rust, Eff, and others.

This is different from TypeScript, because it is based on type-classes, not classical inheritance, different from PureScript because it allows imperitivity and state.

This issue is for the initial design of the language, with other issues being forked as necessary for specific topics like syntax, semantics, type system etc.

I'd like to chime in that Elements of Programming is one of my favorite tech books, it's got a very prominent space on my book shelf, next to Luca Cardelli's A Theory of Objects and Bird and Wadler's Introduction to Functional Programming (1st Edition). If a language could express the ideas in those three books combined, it might very well be my favorite language.

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read:
http://sauerbraten.org/lee/ecoop.pdf

What will the solution for this problem look like in your language? (I think it's a great litmus test for languages) It shows why just type classes has some problems solved by delegation with differential inheritance (a more powerful superset of subclassing).

keean commented

@SimonMeskens I'm not familiar with "A Theory of Objects" but a while back I contributed to this paper https://arxiv.org/pdf/cs/0509027.pdf which shows how object types fit into a formal type system.

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

I will take a look at the examples in that paper, thanks.

keean commented

@SimonMeskens I have implemented an example from the Shark paper in Rust, as it is a similar language (imperative and has traits):

trait Animal {
    fn swim_away(&self) {
        println!("swim away");
    }
}

trait Encounter<A> : Animal where A : Animal {
    fn encounter(&mut self, other : &A);
}

struct Fish {}
struct Shark {
    healthy : bool
}

impl Shark {
    fn swallow<A>(&self, _ : &A) where A : Animal {
        println!("swallow");
    }
    fn fight<A>(&mut self, _ : &A) where A : Animal {
        self.healthy = false;
    }
}

impl Animal for Fish {}
impl Animal for Shark {}

impl Encounter<Fish> for Fish {
    fn encounter(&mut self, _ : &Fish) {}
}

impl Encounter<Shark> for Fish {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if other.healthy {
             self.swim_away()
        }
    }
}

impl Encounter<Shark> for Shark {
    fn encounter(&mut self, other : &Shark) where Self : Animal {
        if self.healthy {
            self.fight(other);
        } else {
            self.swim_away();
        }
    }
}

impl Encounter<Fish> for Shark {
    fn encounter(&mut self, other : &Fish) where Self : Animal {
        if self.healthy {
            self.swallow(other)
        } else {
            self.swim_away();
        }
    }
}

I don't intend to use the same syntax as Rust, but it gives an idea of the structure in a language with type-classes. One key difference with JS as the target is that we have garbage collection, so we don't need to have any lifetimes, which simplifies things (although Rust correctly infers all the lifetimes in this example without any annotation), and also because of this we don't need affine types, and JS always passes by value (objects are values which are references to the object).

keean commented

First design questions are how much type inference should there be, and what are the consequences for syntax. Considering the simplest polymorphic function, in a Rust-like style:

fn id<A>(x : A) -> A { x }

Or Haskell style:

id :: A -> A
id x = x

The Haskell style has the advantage of working with type inference, and the Rust style has the advantage of automatically introducing scoped type variables.

Obviously there will need to be type signatures in type-class definitions, and records / module definitions.

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best. If you want, I could give some samples of syntax that might be fun to play around with. I think I'm going to get out Elements of Programming and look up a few good examples and try to create a syntax that is made explicitly to express those problems.

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows, introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

@keean the type signatures of all exported interfaces must be explicitly declared and not inferred, else modularity (separate compilation) is impossible which is critically needed if targeting JavaScript and its dynamic runtime module importing. This also makes the interaction of certain higher-order first-class type features decidable when moving away from the origin on the Lambda Cube, as well enables to implement typeclasses in more than one way (which Haskell can't do due to its global inference).

It also enables decidability with first-class anonymous (structural) unions (disjunctions), which is a mandatory feature of the language else I won't contribute. TypeScript, N4JS, and Ceylon (which emits JavaScript) have this feature. Scala 3 (Dotty) also has it (and there is a Scala.JS compiler for Scala 2). Ceylon also doesn't support typeclasses, but Scala 2 (and presumably Scala 3) does. Inferred structural unions are useful for many paradigms and is necessary to support the JavaScript || logical operator which doesn't always return a boolean.

Declaration of type signatures within functions (also methods) should be optional and inferred if not specified. Declaration of non-exported function type signatures could perhaps also be optional and inferred.

@keean to add to your opening comment of this issue thread, TypeScript is intentionally unsound typing and PureScript lacks first-class anonymous structural unions. N4JS appears to be sound typing, but also lacks typeclasses.

keean commented

@SimonMeskens I am interested to see what the code looks like with delegates. I provided the sample in Rust because I can syntax check and run the example, there is always the problem of unchecked errors when posting hypothetical code samples.

keean commented

@shelby3 I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism. I think a reasonable assumption is that everything inferred is monomorphic. If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

keean commented

@shelby3 regarding first class unions, I think one of the key features is supporting the kind of programming you want to do. I do want to suggest an idea though, and see what the consequences are. The idea is to have no concrete types at all, so all types are expressed by type variables constrained by type classes. Consider the identity function using a cross between Haskell and TypeScript notation:

id(x : a) : a = x

'A' can be any type, so we could constrain it to be a type-class using a Prolog like notation:

id(x : a) : a :- Integer a = x

This is already the union of all types that implement Integer, but we can go further and have a type-level or operator:

id(x : a) : a :- Integer a | String a = x

This has a nice synergy with JavaScript and duck-typing as we are saying that we don't care what type is passed, as long as it provides the correct interface. We can type check this nominally in TraitScript, but it reflects the underlying JavaScript that allows any object to be passed as long as it provides the right methods and properties.

The question is, does that give all the functionality that you get with union-types?

keean commented

@shelby3 I find myself re-thinking in the morning and finding I don't agree with what I posted yesterday. An or operator for traits does not make any sense, as it is saying the object may have either method A or method B. What we want to know inside a function is if we can safely call method A for example.

I don't think there is any need for an 'or' operator, or first class unions outside of dynamic types (aka trait-objects). I think I need to revisit the original expression problem with type-classes.

Starting with a collection of trait-objects (existential types):

c : [a] :- Integer(a) = [new A(), new B()]

Requires all objects inserted into 'c' to implement the Integer trait. Now lets say we want to print this as well, we somehow need to inject the 'Show(a)' trait into 'c'. This is actually an 'and' operation, but it occurs in an 'open' way, somewhere else in the source code, something like:

show(c : Coll) {
    foreach(c, a => {
        print a;
    })
}

negate(c : Coll) {
    foreach(c, a => {
        a = -a
    })
}

This requires anything calling 'show' has to be able to prove all elements in collection 'c' provide show. This to me suggests that what is needed is some kind of constraint inference. We can easily infer the set of constraints on each element of 'c' as being (Show a, Negate a) or something like that. The question is whether 'c' is heterogeneous or homogeneous, there are two possible type signatures:

data Coll = Coll [a]
data Coll = forall a . (Show a, Negate a) => Coll [a]

In the former case 'show' and 'negate' would need to each have the constraints on '[a]' in their type signatures directly (although they can be inferred). One option would be to allow:

data Coll = forall a => Coll [a]

To automatically accumulate the inferred type classes. But what to do on module boundaries?

@keean I explained to N4JS's @yoshiba why Joose and Object Algebras do not solve Wadler's Expression Problem and are not as orthogonally extensible as typeclasses.

@keean wrote:

@SimonMeskens wrote:

What will your objects look like? Have you studied Self? I highly recommend giving this short paper a read:
http://sauerbraten.org/lee/ecoop.pdf

With objects there is a problem with dispatch in that "a.f(b, c, d)" gives special treatment to the receiving object 'a', and only does dispatch on this. With type-classes we start with a language that has no overloading. If we want to allow overloading we use a type class, and a type class can dispatch on multiple type parameters.

Also remember we discussed in May, that typeclasses (not virtualized typeclasses, i.e. Rust's trait objects) can be monomorphic at the function application use-site because the data type is not lost (as in subclassing where it is assigned to a supertype), thus avoiding the virtual method jump table lookup which otherwise stalls the CPU pipeline reducing performance by as much as to 5 - 6 times slower.

keean commented

@shelby3 I have created a second issue here #2 to discuss union types.

I am fully aware of monomorphisation, which is when the type is statically determined. There is also runtime polymorphism (which it what haskell uses existential types for, and Rust uses Trait objects. I prefer the existential approach for the type system, as it is actually sound as a logic).

keean commented

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

@keean wrote:

JS always passes by value (objects are values which are references to the object).

ASM.js passes by value unboxed. For functions operating only on numbers, afaics we could support this seamlessly. Emulating other types (e.g. strings) in ArrayBuffers would require more complexity to compile and interopt, and might require more C-like syntax. We could keep in mind these sort of optimizations and whether we want and could reasonably provide this without forcing the programmer to resort to a FFI.

No need to discuss this ASM.js tangent right now.

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

@SimonMeskens wrote:

Thanks for providing the example in Rust. I think there's a possibility to introduce some concepts to make it more powerful and concise. I assume you'd prefer me to open an issue with a proposal of any specific features? Btw, in the paper I linked, multiple dispatch is more or less equal to type classes, so, as the paper shows,

The huge salient difference is that multi-methods is dynamically dispatching to the implementation of the subclasses of Animal at run-time, thus stalling the CPU pipeline and adversely impacting performance.

Wheres, the Rust type-class example shows that type-classes can be resolved monomorphically at compile-time, so there is no jump table lookup.

Edit: another huge difference is that type-classes don't bind the multiple interfaces in the class inheritance and instead keep interfaces and classes orthogonal.

introducing a concept of delegation improves upon the code semantics. I've been playing around with the idea of delegation in Haskell and I think I know where to go with it. I'll move that talk to the other issue then.

I think you can write delegation as a monadic structure, which might be the real ticket. I also look forward to your new implementation. I think it's a cool example as it covers such a broad range of language capabilities.

@keean wrote:

Also note the mult-method dispatch version of the Sharks code makes some other changes, introducing new types for HealthyShark and DyingShark, I will update the Rust code to reflect this way of implementing it.

Changing the interface based on the state of data type, conflates the instance of the data type with the interface. This makes orthogonal extension with for example collections of instances inefficient and boilerplate prone.

I don't think this is a paradigm that should be used.

P.S. if we do our job exceedingly well (as in demonstrate that type-classes and sound typing kick ass on subclassing), there is a remote possibility we could be designing the future replacement for EMCAScript. So performance should be high on our list of priorities, because performance is very difficult to achieve in a dynamically typed language and virtual inheritance of subclassing messes up performance. Google is looking for inspiration for their SoundScript concept of a soundly typed JS-like language in the browser. The guy leading that effort (Andreas Rossberg) is familiar with Scala, ML, etc.. Andreas commented that N4JS is "very interesting".

But I don't know if fast and incremental compilation will be achieved in the language we are contemplating. We can think about this in our design decisions if we want to.

keean commented

SoundScript is described here: http://www.2ality.com/2015/02/soundscript.html which sounds like JavaScript plus enforced classical objects and inheritance, and simple function typing. There's nothing about generics or code generation.

SoundScript's goal is to actually be somewhat compatible with TypeScript, I assume Google intends to make it so we don't have to compile TypeScript for Chrome anymore.

@keean wrote:

which sounds like JavaScript plus enforced classical objects and inheritance

Until they figure out that virtual inheritance stalls the CPU pipeline and type-classes don't. I think we have a very tiny (almost nil) chance to influence the future. Any way, we should design for performance assuming one day our language will have its own VM and not just compiled to JS, if that doesn't incur any huge trade-offs in our design constraints. Also we can think about optimizing some parts with ASM.js. I am not sure how this thought process will play out. I am trying to keep my mind open and see where the design constraints lead us. I agree the priority is on having a simple, clean language without proliferation of duplicated paradigms.

Edit: battery life (i.e. performance) on mobile is important. I could see our language potentially becoming very popular for mobile apps.

There's nothing about generics or code generation.

In the link I provided, they mentioned TypeScript as a possible starting point, but aiming to make it sound. I think they will (or maybe already have) discovered that retaining interoperability with TypeScript code while also making TypeScript sound is probably not practical. Of course I could be wrong about this.

From now on, in all discussion where I mean to use type-classes, I will use the word trait to indicate the interface of the type-class, since that is the name Rust chose. We might choose a different keyword or not, yet until then I will use the term trait. And not the same thing as trait objects.

@keean wrote:

I agree all exported interfaces must have type signatures. For higher-order features I quite like first-class-polymorphism.

So I assume by "first-class-polymorphism", you mean we can declare a (function, method, or class) type parameter constrain to a trait (or union of traits) bound, and then I can call functions and methods without knowing the actual type of the data which implements the trait(s)?

I presume by "first-class-polymorphism", you don't mean higher-ranked types.

I think a reasonable assumption is that everything inferred is monomorphic.

Unpacking this, I presume you mean that inference will never subsume to a subclassing subtype (with virtual inheritance) nor to a virtualized trait object (which also incurs dynamic dispatch and stalls the CPU pipeline)? And this is why we need anonymous structural union types, e.g.:

if (x) true
else 3

The inferred type is Boolean | Number.

I am presuming we will choose an "everything is an expression" grammar. Note that the following expression (not the return type of the containing function) has a type of undefined (or the bottom type):

if (true) return 1

If non-exported function signatures can be inferred, then a syntax with optional type annotations would seem better.

Agreed. Thus we need the suffixed : type and not the C/Java prefixed type declaration.

@SimonMeskens wrote:

I think that if the goal of the language is to provide a very simple, but very expressive language, you should consider going for a very simple, but expressive syntax. In that case, I think I like the Haskell syntax more than the Rust syntax. I've never been a fan of how dense the Haskell syntax gets though, so I'd aim to sit somewhere in between. I think Haskell syntax, with words instead of symbols, would work best.

I think I'd prefer to stay as close to JavaScript as possible in expression syntax while everything becomes an expression (optionally a statement as well). The point is to produce a language that is very easy for everyone to learn.

However, I hate noisy code (unless necessary to provide reasonable cognitive load), so I am open to adopting some function level grammar ideas from Haskell, such as:

  • mandatory block indenting instead of curly braces (Python is very popular)
  • function name overloading with default arguments (multiple versions of a function with same name)
  • everything is an expression (but some can also be used as statements with side-effects)
  • pattern matching with switch or match within function, but not on function arguments
  • guards within function, but not on function arguments
  • semicolons not allowed (don't enable to cram multiple statements on one LOC as it is not readable)

I prefer consistency of coding style because we are in the open source reality now, where code should be readable the same to everyone. So I would prefer to enforce a chosen style.

I realize we might not agree on all of these. Let's discuss. But syntax may not be the most important discussion to have now and first.

I agree with all of those list items 👍

@SimonMeskens cool we finally agree on something 😍

Yeah, sorry if I acted like an ass on the TypeScript place, not sure why I get defensive there, I'm a pretty nice guy usually. I also don't have personal beef with you, life is far too short for that 😄

keean commented

I agree with most of that, with some restrictions.

  • function name overloading, I would want all overloads to have the same type, so the overloads are on data constructors.
  • as an imperative language with mutation, i think statements make more sense, but in general everything that makes sense should be an expression too.
  • function overloading on constructors implies pattern matching in function arguments, so I think these two are mutually exclusive. You cannot have both. I would rather have pattern matching and overloading in arguments as it leads to less indentation.
  • I quite like function guards on arguments.

Couple of neat examples.

Pattern matching in arguments:

append x Nil = x
append x (Cons h t) = Cons h (append x t)

Guards on arguments:

find_zero Nil = false
find_zero (Cons h t) | h == 0 = true
find_zero (Cons h t) | h /= 0 = find_zero t

@keean regarding patterns and guards on arguments, if we are targeting JavaScript, then I am thinking we should try to have a close correspondence between our source code and the emitted JavaScript. I think this is a reasonably high priority goal, given one is testing in the browser in the wild and doesn't have source maps.

I agree your examples do provide a slight reduction in verbosity, but...

Mucking with the way functions are declared will cause cognitive dissonance for programmers who only know C/C++/Java/JavaScript languages. The first time I saw pattern matching on function argument declarations, it looked like gobbledygook. You want to make this a popular language if possible? If I go adopting this language for a mobile apps platform ecosystem I have in mind, then I may receive complaints about the programming language being too weird/obtuse. Also the pattern matching syntax looks even noisier if we match the way functions are declared with the way they are applied with mandatory parenthesis:

append<A>(x: A, list: Nil): A = x
append<A>(x: A, List<A>(head, tail)): List<A> = new List(head, append(x, tail))

Alternative:

append<A>(x: A, List(head: A, tail)): List<A> = new List(head, append(x, tail))

Compare to:

append<A>(x: A, list: List<A>): List<A> = new List(list.head, append(x, list.tail))

Is it really worth complicating it for newbies and adding another paradigm?

Alternatively we could add where but it adds another keyword (and paradigm):

append<A>(x: A, list: List<A>): List<A> = new List(head, append(x, tail))
  where head = list.head, tail = list.tail

But that is unnecessary because if the function will be multiple LOC, then we can just reuse val head = list.head instead (val being same as JavaScript const):

append<A>(x: A, list: List<A>): List<A> =
  val head = list.head, tail = list.tail
  new List(head, append(x, tail))

I think I prefer let instead of val and var instead of let (but the emitted transposed JavaScript may be confusing). JavaScript really mucked this up badly because they needed to alter the semantics of mutable var and so needed a new keyword for mutables and they choose let to represent a variable but let should be immutable. And these keywords are referring to mutability of reassignment and not to whether what they reference is immutable.

Edit: also patterned guards are going to create name mangling in the emitted JS code if you are resolving these statically.

P.S. note I prefer to write that with an inferred return (result) type:

append<A>(x: A, list: List<A>) = new List(list.head, append(x, list.tail))

@keean wrote:

function name overloading, I would want all overloads to have the same type, so the overloads are on data constructors.

function overloading on constructors implies pattern matching in function arguments, so I think these two are mutually exclusive. You cannot have both. I would rather have pattern matching and overloading in arguments as it leads to less indentation.

Can you unpack that? I don't understand.

c : [a] :- Integer(a)

Gobbledygook to my eyes. What does that mean?

Looks like a pattern similar to c > [a] >= Integer(a).

Please no symbol soup :-. Is that a smiley face without the smile.

keean commented

In the append case both arguments are lists.

It's not just the reduction in verbosity, but the lack of indenting, keeping everything lined up to the left makes for more readable code. I am not concerned about keeping close to JS, especially if native compilation or web-assembly are future targets. However I do think you have a point about people not used to the style.

If you don't have pattern match in arguments, then you can only have one function declaration for each function, no ad-hoc overloading, because you need to use a type-class to overload different argument types.

I think let for immutable assignment, and I wonder is mutable assignment is necessary, as objects are a reference you can mutate the object referenced, even if the reference itself is immutable. Perhaps we could start with only single immutable assignment and see how it goes?

keean commented

Overloading on function type requires a type class, so what other kind of overloading is there? You can overload on constructor patterns (like my example with append), but it sounds like we don't want that.

As for:

c : [a] :- Integer(a)

It says 'c' is a list of elements of type 'a' where 'a' is constrained to implement the type-class 'Integer'.

You would write this in Haskell:

c :: Integer a => [a]

I am not sure you could write this in Rust, except as part of a function declaration:

fn f<a>(c : List<a>) where a : Integer

However I have realised that this is using a concrete Lust type, which is not generic. We should really use a generic collection interface like this:

 fn f<a, b>(c : a) where a : Collection<b>, b : Integer

But we want to be able to type variables generically not just functions. My syntax for this would be:

c : a :- Collection(a, b), Integer(b) = ...

':-' is the Prolog inference operator.

Obviously we need a syntax that is easy to understand.

@keean wrote:

In the append case both arguments are lists.

The cognitive load is too high. Now I see it (below), because x is result of append so x is inferred to be a list. But it wasn't explicitly declared. Sorry most programmers don't think this way (yet). It will be a massive cognitive dissonance (even though it is second nature for you and you don't even have to think about it).

append x Nil = x
append x (Cons h t) = Cons h (append x t)

I will quote from the author of Learn You A Haskell:

I failed to learn Haskell approximately 2 times before finally grasping it because it all just seemed too weird to me and I didn't get it.

My experience was the same. It wasn't until I understood Scala and some other functional languages that Haskell clicked for me. And Monads, Applicatives, and Kleisi Arrows still give me headaches.


@keean wrote:

It's not just the reduction in verbosity, but the lack of indenting, keeping everything lined up to the left makes for more readable code.

The examples I provided are also left-justified. I don't understand.

However I do think you have a point about people not used to the style.

Thanks. If anyone can convince me that is not a priority, then please do.

I am not concerned about keeping close to JS, especially if native compilation or web-assembly are future targets.

I am! It will be our main target for a long time probably. Other targets need time to mature. JS target is free, batteries included¹. I quote ESR and his 166 IQ:

Too clever by half lands you in incident pits.

How do you avoid these? By designing to avoid failure modes. This why “KISS” – Keep It Simple, Stupid” is an engineering maxim. Buy the RTC to foreclose the failure modes of not having one. Choose a small-form-factor system your buddy Phil the expert hardware troubleshooter is already using rather than novel hardware neither of you knows the ins and outs of.

Don’t get cute. Well, not unless your actual objective is to get cute – if I didn’t know that playfulness and deliberately pushing the envelope has its place I’d be a piss-poor hacker. But if you’re trying to bring up a production mailserver, or a production anything, cute is not the goal and you shouldn’t let your ego suck you into trying for the cleverest possible maneuver.

¹ Meaning it is a widely distributed VM with a huge installed user base.


@keean wrote:

If you don't have pattern match in arguments, then you can only have one function declaration for each function, no ad-hoc overloading, because you need to use a type-class to overload different argument types.

Okay I see your point. You are assuming we will never allow functions that input concrete data types, only trait bounded type parameters (per that idea you mentioned upthread). Then a data type might be implemented for traits declared by two or more of the overloaded variants, so at the use-site it would require a cast in order to select which overloaded function in that case of ambiguity. But this doesn't mean we couldn't have function overloads, rather that ambiguities create verbosity of casts. And I don't see how patterns and guards changes that since they share the same function name. Are you proposing they won't be resolved statically?

Also functions can be overloaded by arity of arguments, not just type.

I think let for immutable assignment, and I wonder is mutable assignment is necessary, as objects are a reference you can mutate the object referenced, even if the reference itself is immutable. Perhaps we could start with only single immutable assignment and see how it goes?

How do I implement a for loop if i is not mutable?

keean commented

I don't think you have quite understood the overloading. A function has a type... it can only have one type, that is fundamental, otherwise you cannot know the type of the function :-)

So how do we allow function overloading, with type classes for example (in Haskell):

class Show a where
    show :: a -> IO ()

instance Show Int where
    show x = int_to_string x

instance Show Float where
    show x = float_to_string x

So show is overloaded on the type of its argument. That's what type-classes do, they allow type-safe overloading.

None of this allows functions with different arity, because that just cannot be typed. Neither Haskell nor Rust allow functions different arity even using type-classes or traits.

However you can simulate this (and Pythons named optional arguments) by passing a Record as the argument.

With for loops, we can see the loop variable as immutable single assignment inside the loop. We can see the loop body as a function, and 'i' is its single argument.

One idea I like is that the notation for an anonymous function is simply '{' ... '}'. Now consider:

for x in range(1, 4) {print x}

This is really just syntactic sugar for:

range(1, 4).foreach(function(x) {print x})

@keean wrote:

A function has a type... it can only have one type, that is fundamental, otherwise you cannot know the type of the function :-)

You are I guess only thinking of the way it is in Haskell (and Rust), but there is another way which I've seen in other languages such as method overloading in Java. Each duplicate function name is a different function.

Some people claim that form of overloading interacts very badly with inference. And other corner cases.

Let's not get too verbose here. If there is any disagreement on this point, please go to Rust's forum and private message me. We can resolve there instead of cluttering this thread with many back and forth.

@keean I am not 100% closed-minded to patterns in arguments. They are a little bit weird but not catastrophically so if put them in the more familiar C-style function declaration that I showed by example.

IMO, guards on arguments are far too noisy and make the function declaration unreadable for many of us. It is better to use a switch or match expression within the function to break up the orthogonal concepts of typing and specifying the arity and names of the arguments, versus run-time guarding the code paths. Guards aren't statically resolved (dispatched) correct?

keean commented

@shelby3 wrote:

You are only thinking of the way it is in Haskell, but there is another way which I've seen in other languages such as Java. Each duplicate function name is a different function.

It certainly messes with type inference if you do this, and breaks local reasoning, because new versions of a function may change program behaviour, especially combined with generics.

@keean wrote:

@shelby3 wrote:

You are I guess only thinking of the way it is in Haskell (and Rust), but there is another way which I've seen in other languages such as method overloading in Java. Each duplicate function name is a different function.

Some people claim that form of overloading interacts very badly with inference. And other corner cases.

It certainly messes with type inference if you do this, and breaks local reasoning, because new versions of a function may change program behaviour, especially combined with generics.

I agree, it is a can-of-worms. I hadn't thought it through entirely, especially on the point of changing program behavior far from the declaration site. Yikes! 😨 TypeScript allows this apparently.

Hmmm. But how does it differ from a function which inputs a union of traits, e.g. Write | Show? Isn't adding another function overload equivalent to adding another item to the union for an argument of a single function? Are you claiming we can't write a function that inputs a union of trait interfaces? It seems incorrect to force the programmer to choose another boilerplate name (e.g. outputWrite and outputShow) for a function that is doing the same operation, just to differentiate the implementation of that function for two different input types. However, I agree that when multiple arguments per function are interacting in this matrix then it gets quite complex to determine which function will be chosen by in(ter)ference.

So I lean towards disallowing Java-style function overloading, but allowing unions and intersections for argument types.

Multiple arity can usually be handled with default arguments. And out-of-order defaults can be handled with named arguments on application, e.g. apply(arg6 = value).

The above is what N4JS has chosen (except I don't know if they implemented named arguments).

keean commented

With traits you would force the user to write a trait (type class) for a function that accepts different input types. You don't necessarily get rid of the behaviour changes, but you limit where it can occur to a trait, and it's only a problem if you allow specialisation. You can use some heuristic rules to make sure trait specialisation is localised to one file like Rust does.

To get named arguments and variable arguments, you can use a record (like a struct) with default values. You create an instance of the struct, and have to provide values for all elements of the struct that don't have defaults. You can then have a function that accepts the struct as its argument. With a bit of syntax sugar this can be made to look like Python named (optional) arguments.

I guess whether guards on arguments (including any guard to access the instance of union) are statically or run-time dispatched is a compiler implementation detail? In other words if the guard is the first operation in the function, then in some cases (i.e. it is statically type matched) it can be replaced with static calls directly to the leaves of the guard, rather than run-time jumps which stall the CPU pipeline (twice, once for the call on function application and again to jump to the code for leaf of the matched guard).

keean commented

I think guards are just 'nice' syntax. I don't think there is any difference between:

find_zero Nil = false
find_zero (Cons h t) | h == 0 = true
find_zero (Cons h t) | h /= 0 = find_zero t

is really the same as (with Python like syntax):

find_zero(x):
    if x == Nil:
        false
    else if x.head == 0:
        true
    else:
        find_zero(x.tail)

Note: there are problems with this approach, what happens if you call x.head when x == Nil ? With pattern matching this cannot happen.

@keean wrote:

With traits you would force the user to write a trait (type class) for a function that accepts different input types.

I don't like this. Unnecessary boilerplate to create a dummy nominal trait which is the union of two other traits. Programmer would prefer to write the anonymous structural union instead, e.g. Write | Show. You seem to be trying to find a way to not have structural unions. But the alternative is lots of boilerplate and silly names such as trait WriteOrShow. How would you even make such a trait?

@keean wrote:

To get named arguments and variable arguments, you can use a record (like a struct) with default values. You create an instance of the stuct, and have to provide values for all elements of the struct that don't have defaults. You can then have a function that accepts the struct as its argument. With a bit of syntax sugar this can be made to look like Python named (optional) arguments.

Why add all this boilerplate and run-time overhead? We have compilers is to remove such inefficiencies.

Because you want to unifying concepts?

keean commented

@shelby3 wrote:

I don't like this. Unnecessary boilerplate to create a dummy nominal trait which is the union of two other traits. Programmer would prefer to write tthe anonymous structural union instead, e.g. Write | Show. You seem to be trying to find a way to not have structural unions. But the alternative is lots of boilerplate and silly names such as trait WriteOrShow.

I think you are missing the point, this has nothing to do with WriteOrShow (and I don't think such traits make any sense, you never want 'WriteOrShow' because you don't know what methods are valid.). This was about allowing ad-hoc multi-methods. To allow 'show' to be applied to multiple different types.

If the Write trait implements 'write' and the Show trait implements 'show', if I have an 'x' such that I know it implements Write or Show, which function can I call on it, can I do "x.write" or can I do "x.show"... actually either of these can fail, which makes the program unsound. If we want a sound type system we cannot allow this.

keean commented

@shelby3 wrote:

Why add all this boilerplate and run-time overhead? Wwe have compilers is to remove such inefficiencies.
I don't understand, what boilerplate, and what run-time overhead. All this is going to get removed by the compiler because JS does not support named arguments.

I am talking about how to 'type' such things, what the syntax is, and what the emitted JS code is not affected by how the type-system works.

@keean wrote:

If the Write trait implements write and the Show trait implements Show, if I have an x such that I know it implements Write or Show, which function can I call on it, can if do x.write or can I do x.show... actually either of these can fail, which makes the program unsound. If we want a sound type system we cannot allow this.

That is why we are forced by the compiler to guard the two cases with a match (or if-else) within the function. We can't call any method on x until we guard it.

Please use backticks ` as above.

@keean wrote:

@shelby3 wrote:

Why add all this boilerplate and run-time overhead? We have compilers is to remove such inefficiencies.

Because you want to unifying concepts?

I don't understand, what boilerplate, and what run-time overhead. All this is going to get removed by the compiler because JS does not support named arguments.

I am talking about how to 'type' such things, what the syntax is, and what the emitted JS code is not affected by how the type-system works.

Okay I understand now that you just want to unify concepts in the type system.

You can check the edit times on my comments to see I completed my edits before you posted your reply. You are picking up my comments a bit too fast, before I finish editing them.

keean commented

@shelby3 wrote:

That is why we are forced by the compiler to guard the two cases with a match within the function.

This is a disjoint or named (tagged) union, where you can match on the constructors. Like this:

data X = S String | I Int

match x:
    S x => // x is a String
    I x =>  // x is an Int

@keean wrote:

This is a disjoint or named (tagged) union, where you can match on the constructors.

It can be tag-less if it is resolved at compile-time.

keean commented

That's what traits are for :-), they provide compile time type-case functionality. Disjoint (tagged) sum types provide runtime type-case.

@keean there is one case where I want to cast my input to Write or Show to select the trait that is selected at compile-time. It is just a way of naming the function with a cast instead of putting it in the function name. And it allows one function to support many casted variants on many arguments, so don't have permutation blowup in boilerplate function names, e.g. doitShowCursorGroup, doitWriteSegmentSet, etc.. For example, without my idea then 3 arguments with 3 items per union type each, would require 27 function names and declarations instead of just one.

@keean wrote:

That's what traits are for :-), they provide compile time type-case functionality. Disjoint (tagged) sum types provide runtime type-case.

I know. I'm sure you remember we discussed this at great length (500+ comment posts between us) at the Rust forum in May.

First of all, I don't agree that we can always input only traits into functions, because it means we lose track of the data type, so we are unable to use it as an input for other functions which expect a different trait bound. For example a List constructor would sometimes be instantiated with a data type and not a trait object type, so that the items in the list can be employed with calling different functions with full extensibility of the Expression Problem.

Data types will need to be tagged at run-time so that structural unions of them can be guarded (and note that these guards are automatic boilerplate issued by the compiler). JavaScript has instanceOf which is now more reliable with [Symbol.hasInstance] instead of relying on the prototype.constructor.

I agree unions of traits make no sense except for the use case of my prior comment where they are resolved at compile-time and are really just sugar for not needing to declare a permutation of function names.

I think if we have structural unions of data types, we may not need trait objects or we need both. And dispatch will be more efficient since we know the possible data types in the union, so compiler can use more efficient switch statements.

Without structural unions of data types, then we will need a lot of boilerplate or trait objects but trait objects don't solve both axes of the Expression Problem.

Edit: note we can't subsume data types to their greatest lower bound subtype, as that would require subsclassing and virtual inheritance. And we can't always subsume them to a Rust-esque trait object (virtualized trait) because of what I wrote in the 2nd and 6th paragraphs above.

@shelby wrote 3 days ago:

Another related issue is that type parametrized collections/containers (e.g. List or Array) whose element type is a heterogeneous collection of subclasses have been handled historically by subsuming to the greatest lower bound common type (which I also mentioned upthread) and relying on virtual inheritance so that all types in the collection are subsumed to a supertype of common methods and calling those methods using dynamic dispatch to call the implementation for the method of the concrete subtype. This pattern is extensible in one axis of Wadler's Expression Problem because we can add new subtypes to our collection without recompiling the source code for the supertype (interface) and without reconstructing a copy of pre-existing instance of the collection and its existing contained instances. However, it is not extensible in the other axis of the Expression Problem because we can't add a new method to the base class (supertype inteface) without recompiling the source code and reconstructing instances.

By employing first-class disjunctions a.k.a. unions (and anonymous unions which are a feature of TypeScript, eliminates a lot of boiler plate) for the aforementioned container class's type parameter, the concrete types of the heterogeneous collection are not subsumed (not lost); and that heterogeneous union type is applicable to any typeclass interface need (e.g. a function argument) if there exists an implementation of that typeclass interface for every concrete type in that said union. In this way we can have extension in both axes of Wadler's Expression Problem without recompiling and reconstructing the existing instances and their sources. Solving Wadler's fundamental Expression Problem maximizes extensibility.

However, there remains a problem in that any pre-existing instance pointer for the said heterogeneous collection will be invalid if a new concrete type is added to the container's type parameter, because a reference to number | string is not compatible with a concrete instance of type of number | string | Array<number|string>. Afaics, this problem can solved in two possible ways in lieu of cloning (but not deeply) the container. First, the container can be designed so that when a new concrete type is added, it returns a new reference to the new container type and this new reference is isolated from any external pre-existing references to container. Meaning that somehow the container's code has an internal state which tracks which issued references refer to which subsets of contained concrete types. The other solution I envision is the the compiler is smart enough to do some form of lifetimes tracking such as in Rust, and will insure that the pre-existing external references are invalidated when the new concrete type is added.

The bolded in the last paragraph can be accomplished with invariant data structures such as a List.

Edit: note it doesn't require invariant data structures in the case that the type parameters of the data structure are of a homogeneous type (not heterogeneous).

@keean wrote:

I think guards are just 'nice' syntax. I don't think there is any difference between:

find_zero Nil = false
find_zero (Cons h t) | h == 0 = true
find_zero (Cons h t) | h /= 0 = find_zero t

is really the same as (with Python like syntax):

find_zero(x):
    if x == Nil:
        false
    else if x.head == 0:
        true
    else:
        find_zero(x.tail)

But when the compiler knows at compile-time which branch will be selected, it could optimize it with a direct call to the branch to prevent stalling the CPU pipeline.

Note: there are problems with this [Python's] approach, what happens if you call x.head when x == Nil ? With pattern matching this cannot happen.

Pattern matching at the argument isn't required (unless data types are erased not tagged at run-time and the branch must be determined at compile-time). The type of List<A> can force that is must be guarded where ever before access to head is possible.

I presume you know that if List<A> is a sum type with two possible values Nil and Cons<A> then the compiler refuses to give access to head until an instance of List<A> has been guarded/matched to Cons<A>. The head method exists only on Cons not List nor Nil.

@shelby3 wrote:

Hmmm. But how does it differ from a function which inputs a union of traits, e.g. Write | Show? Isn't adding another function overload equivalent to adding another item to the union for an argument of a single function? Are you claiming we can't write a function that inputs a union of trait interfaces? It seems incorrect to force the programmer to choose another boilerplate name (e.g. outputWrite and outputShow) for a function that is doing the same operation, just to differentiate the implementation of that function for two different input types. However, I agree that when multiple arguments per function are interacting in this matrix then it gets quite complex to determine which function will be chosen by in(ter)ference.

So I lean towards disallowing Java-style function overloading, but allowing unions and intersections for argument types.

.

there is one case where I want to cast my input to Write or Show to select the trait that is selected at compile-time. It is just a way of naming the function with a cast instead of putting it in the function name. And it allows one function to support many casted variants on many arguments, so don't have permutation blowup in boilerplate function names, e.g. doitShowCursorGroup, doitWriteSegmentSet, etc.. For example, without my idea then 3 arguments with 3 items per union type each, would require 27 function names and declarations instead of just one.

.

I agree unions of traits make no sense except for the use case of my prior comment where they are resolved at compile-time and are really just sugar for not needing to declare a permutation of function names.

So only for the case of traits, I suggest allow overloaded function declarations on an argument where each trait for that argument is in a different function of the same name (won't cause corner cases because cast is forced by compiler, see below...):

output(x: Show) {}
output(x: Write) {}

Instead of:

output(x) {
  match(x) {
    case Show:
    case Write:
  }
}

Because we really want to indicate that the resolution is compile-time always. The compiler will always force the use-site function application to cast:

output(x:Show)

I think the above is more meaningful (as well as solving the name permutation problem) than:

outputShow(x)

Because it identifies which argument is the variant, which can be especially meaningful when two or more arguments are varying by trait.

However, the counter argument is that the permutation of function declarations duplicates more code if for example the case (code branch) of one argument applies to all three traits of another argument. So in that case seems the match format would be more DNRY. Should we allow both for traits? In both cases, the compiler would still need to force casts. The two formats should be sugar for each other.

For data types, union choices must be declared as a union, and run-time guards apply since data types will always be tagged at run-time.

So the only time we declare a function more than once, is when differentiating by union of trait on an argument, and then these are mandatory casts at use-site function application.

Feedback?

We've reasoned that a union (disjunction) of traits doesn't make any sense (except in my proposed sugar for function overloads on name per my prior comment), because it would require that traits are tagged (at run-time, i.e. reified) which would muck up other aspects of the design. The only tagged types are the data types which we implement traits for. An intersection (conjunction) of traits does make sense, as it provides all the methods from all the traits.

Thus we can't implement a trait for another trait in the way we implement a data type for a trait. Instead should we allow a trait to extend another trait? Should the implementation of the extended trait need only provide the additional methods or should it provide all the methods? Can we provide default implementation in a trait, so an extended trait could call methods of the base trait?

If we offer this trait "inheritance", I am contemplating that we should never allow subsumption (assignment to) a base trait for the type parameter of a trait, because this will introduce covariance issues and I am thinking we would prefer that all trait type parameters are invariant (please click both of my links). And per the first paragraph above, we can never allow unions of traits. So trait extension would not interact with higher-order subtyping. Rather it would only serve as a DNRY (DRY) convenience and for first-order subtyping (i.e. simple assignment). Do you forsee any inconsistencies in this?

I am trying to avoid covariance of type parameters (higher-order subtyping) in order to keep the language concepts simple. I am contemplating given the delayed binding of data, interface, and implementation, thus type-classes enables this optimization of design.

Some comments expressing the opinion we shouldn't start by adding type-classes to a language with other paradigms such as subclassing and structural object typing.

keean commented

Trait extension is a common thing. It doesn't work like inheritance because traits are constraints on types, so given:

trait A {}
trait B : A {}

If we implement trait B for some type, there must also exist an implementation of trait A for that type. It's that simple, there are no variance issues, because these are constraints on types, not types. In polymorphic instantiation, each type variable must take exactly one ground type (all type variables are invariant), irrespective of what traits / type classes are defined on those types.

@keean wrote:

Trait extension is a common thing. It doesn't work like inheritance because traits are constraints on types, so given

I refer readers to the parallel discussion in the Union issue thread, where I wrote a similar point as your quoted text above.

If we implement trait B for some type, there must also exist an implementation of trait A for that type. It's that simple

So you are saying that we reuse the implementation of trait A when forming the implementation of trait B. What if the implementation of B requires trait A to be implemented differently than when trait A is considered alone? Then I presume the answer is that it should not extend from A in that case. And I agree with you, because otherwise we couldn't subsume to common trait A.

there are no variance issues, because these are constraints on types, not types

Agreed, I had also pointed that out.

I invited @paulp here, because he was very outspoken about flaws in Scala, I agree Scala is a clusterfuck of mixing too many paradigms1 (especially that Jurassic ossifying anti-pattern virus subclassing), and I hoped maybe Paul might be outspoken here, because I think he really wants a small, sound language. My recollection from Scala Google Groups discussion is he has valuable technical insights.

  1. https://www.quora.com/What-are-the-downsides-of-Scala-as-a-programming-language

    https://www.reddit.com/r/scala/comments/4246qc/heres_why_scala_sucks/

    https://news.ycombinator.com/item?id=9398911

    https://news.ycombinator.com/item?id=11837256

    http://techblog.bozho.net/i-dont-like-scala/

    https://www.infoq.com/news/2011/11/yammer-scala

    http://grundlefleck.github.io/2013/06/23/using-scala-will-make-you-less-productive.html#the-bad-thingstrade-with-scala

    http://neopythonic.blogspot.com/2008/11/scala.html

    http://fupeg.blogspot.com/2008/12/is-scala-too-hard-or-are-you-just.html

    https://www.infoq.com/news/2011/11/scala-ejb2

    http://programmers.stackexchange.com/questions/25509/what-do-java-developers-think-of-scala#answer-25598

    The last two rave about Fantom, which doesn't even support generics (type parametrization) nor tuples. I wrote about Fantom in 2011, predicting the eventual arrival of ZenScript. And coincidentally someone asked on June 29, 2016, if I had made any progress since 2011. Well yes and no. I also manage to become chronically ill since May 2012 but I'm working on hard on curing that, e.g. ran 3 times for 2kms each within 8 hours the evening and night of this comment to keep my cognition up to 80% level and minimize brain fog and Chronic Fatigue Syndrome symptoms.

    And let's blame it all on @jdegoes 😜 , as he advised me to learn Scala back in 2009 when I was expressing frustration with Haxe.

keean commented

@shelby3 quick update on progress, I have the compiler doing a simple round trip, taking an assignment in the source syntax:

x = 3
x

parsing it into an abstract syntax tree, and then generating JavaScript from the AST, to get the result:

x=3;x;

Which then gets eval'd in the integration unit tests to give the result 3.

I have implemented enough of the target code generator to deal with basic functions, so we need to sort out the source syntax for functions now.

keean commented

I have written a post on Lambda-the-Ultimate about ZenScript here: http://lambda-the-ultimate.org/node/5376

@keean wrote:

I have written a post on Lambda-the-Ultimate about ZenScript here: http://lambda-the-ultimate.org/node/5376

Cool 😎 Just hope ZenScript doesn't get shaped into a researchy academic language. I am here to create the world's most popular programming language. I am serious about the potential to reach that goal. If I could comment at LtU, then I would mention my desire (goal) to not push it too far into researchy, academic paradigms. I am open to discussing the Monadic effects, etc., but I am strongly biased to resist obtuse paradigms or obtuse required design patterns. As I am strongly biased to resist symbol soup and user-defined symbols for method names. I am strongly biased against the additional cognitive load of not using enforced parenthetical group for functions.

keean commented

I think we adopt a fail-fast approach, try things out, keep what works, and throw out things that don't. I agree with your goals, and I think we are on the same page.

@shelby #6 (comment):

@keean wrote:

Remember the advantage of an interface is that the user can define new implementations.

What I am saying is that some type parameters of a typeclass may not need any implementation, because the typeclass's methods treat that parameter as the top type and never access any properties of it, which is the case for a List<A>. A List only needs one implementation. Also some type parameters (of a typeclass) may be bounded by typeclasses.

We need a syntax for expressing this in the typeclass (trait) definition. We have this requirement because our design extends Rust's design to include unions of data type not just trait objects for polymorphism.

Rather than open a new Issue (because I am not yet clear what the scope and thus title of this issue should be), let's discuss this further here and not clutter the Function syntax issue with this tangential issue.

A typeclass interface can have type parameters which are used in methods which never require any specialized implementation w.r.t. to said type parameter. In this case, it makes no sense to require a separate implementation for each type that might be instantiated for the said type parameter.

It seems to me that we need to alter the typeclass interface (in Rust it is a trait) syntax to enable annotating such type parameters that need no specialization. Does Rust and if not, then why not?

keean commented

@shelby3 wrote:

It seems to me that we need to alter the typeclass interface (in Rust it is a trait) syntax to enable annotating such type parameters that need no specialization. Does Rust and if not, then why not?

A type-class will only parameterise the types that do need specialised implementation, or that somehow depend on other types. The other types are either fixed types, or are parameters of the functions in the interface.

keean commented

@shelby3 I also want to bring up a unified syntax for modules, type-classes and records. All these three things are in effect type-scheme dictionaries. Consider a record (Haskell syntax):

data R<a> = R {
    f : a -> a
    g : Int
}

Now consider a type-class:

trait T<a> {
    f : a -> a
    g : Int
}

Finally consider a module:

module M<a> {
    f : a -> a
    g : Int
}

Spot the similarity? They are actually all the same thing a type dictionary (from field-name to type). I would very much like to have a single syntax for all three, actually a single construct that could be used in all three ways interchangeable. I think this would really enhance DRY, by allowing the same 'interface' to be used for modules, type-classes and records.

If we could have one keyword for all three what would it be? I quite like struct from 'C', and it is what Rust uses.

The main difference between a record and a type-class is that a record is passed as a first-class value, whereas a type-class is passed implicitly.

@keean wrote:

A type-class will only parameterise the types that do need specialised implementation, or that somehow depend on other types. The other types are either fixed types, or are parameters of the functions in the interface.

Ah yes thinking it through I see the trick is the delayed binding of the monomorphism (or the dynamic binding of the polymorphic trait object) means that there are no dependencies that this for the implementation bears any subclassing relationship to any other implementation of the typeclass. So there is no need to enforce any (subclassing) relationship on any type parameters which are not involved with selecting an implementation of the typeclass. That is somewhat obtuse to grasp for those coming from subclassing languages. Hmmm. I wonder how we will get people to grasp that if they are coming from Java?

So for example I would never write in source code a trait bounded type Array<Int|String> because trait bounds are always generic. So I have no way to even attempt the erroneous assignment of an Array<Int|String> to a reference of Array<Int>. An instance of a trait bound type does not retain any information about the data type implementation that is in use, which occurs upstream at the call-site to a function with a trait bound or assignment site to a trait object. With subclassing the compiler must insure that only subclasses can be assigned to a superclass, because the virtual inheritance polymorphism requires it (i.e. otherwise the required methods might not be available). Whereas, the typeclass doesn't require any subclassing relationship bound to every instance of the type in order to insure that the implemented methods are available. Thus all containers can be invariant with trait objects (and with my unions concept as these also don't bind implementation of methods to the instance and instead bind to the trait bound at the call or assignment site).

I'll need to think about that over and over again, to make it sure I have it drilled into my thinking. It is a paradigmatic shift from the subclassing mindset.

@keean wrote:

I prefer something more like Haskell where there is not implicit self in the traits:

trait Compatible<A, B> {}
impl Compatible<Int, Int> {}
impl Compatible<Float, Int> {}

Perhaps this is relevant?

@keean wrote:

I have written a post on Lambda-the-Ultimate about ZenScript here: http://lambda-the-ultimate.org/node/5376

Could you respond to @naasking at LtU and give him this link:

#2 (comment)

Be good to get his peer review and my LtU account is some state of limbo (ban or something from the past).

@keean wrote:

Shelby3 explains the solution to the expression problem here: #2 (comment) I find a bit difficult to piece it together from the half-a-dozen posts linked though.

Note I have tried to improve the summary in this comment. Also this one.

@keean wrote:

I also want to bring up a unified syntax for modules, type-classes and records. All these three things are in effect type-scheme dictionaries. Consider a record (Haskell syntax):

data R<a> = R {
    f : a -> a
    g : Int
}

Now consider a type-class:

trait T<a> {
    f : a -> a
    g : Int
}

Finally consider a module:

module M<a> {
    f : a -> a
    g : Int
}

Spot the similarity? They are actually all the same thing a type dictionary (from field-name to type). I would very much like to have a single syntax for all three, actually a single construct that could be used in all three ways interchangeable. I think this would really enhance DRY, by allowing the same 'interface' to be used for modules, type-classes and records.

If we could have one keyword for all three what would it be? I quite like struct from "C", and it is what Rust uses.

We can't use the same keyword for all three unless they have the same semantics, or we detect the semantics from context which would be very obfuscating.

Afaics, overloading data as sum, recursive, and "record" types is absolutely necessary. The "record" names are just positional names (e.g. if we implement with ArrayBuffer in JavaScript), but they can also be interpreted as dictionaries if for example the implementation is a JavaScript Object ({ ... }) or Map. That is why I don't understand your distinction of Objects of No Objects.

I don't like struct instead of data because it implies a layout of fields by position, and doesn't even mean dictionary lookup, not to mention it isn't overloadable to sum and recursive types. If you wanted to remove naming from sum and recursive types by putting all naming in a separate struct keyword, then we would lose functionality.

If we try to unify about a dictionary concept, then we lose unification of for example of function syntax in expressions and in methods.

I don't like the : because it is confusing with the : on the Type that follow on the same line. That is the same reason I didn't like : instead of =>. And it is less noisy when we write a method as method( instead of method : (, and even ES6 has acknowledged this as we can now write:

{ method() {} }

Can typeclass declarations contain non-method types? If yes (and I presume for modules yes), then we will need a let x on them, or a fn method( on methods. I'd prefer the let x given this is consistent with expression syntax.

Edit: also A -> A makes it inconsistent with the implementation site. Also we need to decide what syntax we will use for function types A -> A, (A): A, or as TypeScript does (_:A): A. I am not sure if I prefer the 1st or 2nd one, because I think x:A -> A is confusing to mainstream programmers, but I also think x:(A):A is symbol soup confusion due to the repeating :. Also if ever we want to give functions nominal types, then we must choose the 2nd one name(A):A.

The main difference between a record and a type-class is that a record is passed as a first-class value, whereas a type-class is passed implicitly.

A typeclass is a bound and is never passed. There is never any reference with a typeclass type (typeclass objects are objects with a typeclass bound).

keean commented

@shelby3 wrote:

We can't use the same keyword for all three unless they have the same semantics

Actually they do. There is no real difference between a module and a record, if we allow records to have associated types.

As for type classes, The method invocation for a type-class is the same as using a method stored in a record, except the lookup for each type is implicit rather than explicit.

I will write a bit more about Objects or no Objects in the relevant thread.

So lets look at this in the context of a record and define something:

data R<A> = R (
   fn f(:A) : A
   fn g(:A, :A) : A
)

we define a value like this:

let x = R(f = id, g = add)

However if we have fn it makes me want to do

fn f<A>(x:A) : A => x
let x = fn (x:A, y:A) : A = x + y

What do you think?

@keean wrote:

we define a value like this:

let x = R(f = id, g = add)

What do you think?

Since you asked, I will be frank. That is horrible. You've overloaded assignment into typing declaration. 😲 If you want a mainstream language, better myself (and others) design the syntax. IMO that is highly contorted and confusing.

You've been advocating weird Prolog symbols, noisy guards on functions, proliferating : where not necessary, etc.. 🚯

I think you want Haskell. Culture shock has hit us. 💫

keean commented

I am not sure you realise what you are looking at, its the syntax you designed :-)

let x = R(f = id, g = add)

Do we agree with let x = for immutable variables?

R is the constructor for the datatype, and we agreed round brackets () for constructor arguments.

f = id and g = add are assigning the values to the fields in the datatype. We originally had f : id but you said it conflicted with type annotations so I used = instead.

So I think that is entirely your syntax suggestions being used.

@keean wrote:

I am not sure you realise what you are looking at, its the syntax you designed :-)

I thought you were proposing to declare the type of x inside the module, trait, or typeclass since that has been the context. And I thought you were assigning the type of f and g to id and add in a reverse meaning of = or something (I didn't even want to try to figure out what you were meaning because it looked bizarre for a type declaration). You did not state to create an instance and assign to reference x. Granted I should have noticed the R. I have grown weary of studying carefully your comments. So much verbiage and very little accomplished. 130+ comments arguing in the Subtyping Issue without the correct resolution.

keean commented

Its best to think of it as simple record syntax. You use data to declare the record type, and you can then create values of the record which you can pass as first-class arguments to functions etc.

Once we are happy with the data declaration and initialising value of that type, then I will describe how we can promote instances to behave like type class instance/implementations.

The key think to note is that an interface (type-class) declaration, module declaration and a record-type are all types. Whereas a type-class instance/implementation, module implementation and a value of a particular record type are all values.

Could you get to the point and propose your syntax?

keean commented

So declaring a record type:

data R<A> = R (
   fn f(:A) : A
   fn g(:A, :A) : A
)

Assigning a value of this type:

let x = R(f = id, g = add)

We require a couple of extensions to turn this either into a type class or a module. Can we just make sure we are happy with the record syntax, then I will introduce the two missing bits.

@keean wrote:

I have written a post on Lambda-the-Ultimate about ZenScript here: http://lambda-the-ultimate.org/node/5376

They've started a new topic thread ostensibly motivated by ZenScript, to discuss if are correct to discard prematurely (hardcoded) subtyping relationships. Someone replied that sum types and ADT could encode some subtyping relationships.

@keean wrote:

Can we just make sure we are happy with the record syntax

I don't like the fn and the unnecessary : noise. I already proposed a syntax unified with sum and recursive types:

data List<T> = Nil | Cons(head: T, tail: List<T>)

pluggable Name<A<B>, ...>   // optional <B> for higher-kinds¹
  method(B): A<B>           // TODO: add default arguments, named arguments?

But I don't agree with using that head: syntax for typeclasses.

Please show what you have in mind.

keean commented

So you suggest:

data R<A> = R (
   f: (:A) : A
   g: (:A, :A) : A
)

(I will explain why we don't need any additional syntax for type classes once we have settled on the record syntax).

@keean wrote

So you suggest:

I suggested what I suggested, which is syntax specialized separately for data and typeclass.

(I will explain why we don't need any additional syntax for type classes once we have settled on the record syntax).

Please show why you want to unify the syntax and make it uglier for typeclass declaration?

@keean my thinking is that the syntax for data named record fields needs to be unified with sum and recursive types that also have named positional slots. If we remove named slots from sum and recursive types, then we lose some facet of the full functionality. And so we can't make a data type declaration look the same as a typeclass declaration. We need to have specialized syntax for each.

If you have an idea which you think is better, please explain it entirely so I can understand why you think so.

This is the 450th comment already in this ZenScript Issue tracker for all Issues thus far. 👯 💨

keean commented

First let's agree the syntax for data statements:

data Product = Product (Int, Int)   // a product has no type labels, it is a named tuple.
data Sum = I (Int) | S (String)   // a sum type has labelled alternatives
data Record = Record {size : Int, name : String}   // a record has named fields

All three can be combined, for example:

data Location = Point2D(Int, Int) | Address {
    number : Int,
    street : String,
    city : String,
    zip : String
}

The types can be any valid types, so functions, more complex parametrised or union types, and we can have type parameters for the data type, but thats essentially it:

data List<a> = Nil() | Cons(a, a)

I am used to lower case type variables, so that types and type-variables are distinguishable

data Tree<a> = Leaf (a) | Branch (Tree<a>, Tree<a>)

etc...

I've had a rethink:

data List<T> = Nil | Cons{head: T, tail: List<T>}

That is more consistent with how we declare classes in other languages, objects in JavaScript, and struct in C, which was your point.

I was thinking of Scala:

sealed trait List[T]
case class Cons[T](head: T, tail: List[T]) extends List[T]

But that is the exception from many other languages I know, to combine the construction function with the declaration of the record fields.

keean commented

Okay I will edit the above to distinguish records from tuples with () for tuples and {} for records.

For a nullary constructor like Nil we have three options:

Nil
Nil()
Nil{}

Do we force use of the first?

I can't continue right now. Haven't slept yet. Did you reply on LinkedIn?

💤 😪

@keean wrote:

I am used to lower case type variables, so that types and type-variables are distinguishable

All-caps versus not all-caps is how they are distinguished so that we can eliminate the <A,B...> in front of functions. This is the 4th time I've written that point. Also ALLCAPS type parameter names is idiomatic for Java, Scala, Ceylon, Kotlin, etc afaik.

Edit: note all lowercase for type parameters would also be distinguishable, but it would conflict with idiomatic usage in Java, Scala, Ceylon, Kotlin, etc afaik. Also it would make the argument list more difficult to parse visually as the arguments are also likely written in all lowercase, e.g. let f(x: a, y: b)....

@keean wrote:

For a nullary constructor like Nil we have three options:

Nil
Nil()
Nil{}

Do we force use of the first?

Yes because empty record and tagged object are I think the same in every semantic?

Where does the 2nd option come from? Product/Tuple type? How would an empty named tuple be useful?

Also we need to think about sugar for the anonymous structural tuple.

keean commented

The second option is an empty tuple. I agree we only need the first.

Currently out function syntax is ambiguous with the anonymous structural tuple, which might mean we need a keyword like fn.

@shelby3 wrote:

The second option is an empty tuple. I agree we only need the first.

We only need one unnamed singleton of the empty tuple as the Unit type and value (i.e. void in some languages).

Currently out function syntax is ambiguous with the anonymous structural tuple, which might mean we need a keyword like fn.

Not if we choose the forms within function bodies:

():Type => x + y   // the empty tuple is not LL(inf)
x:Type y:Type => x + y 
let f(x: Type, y:Type): Type = x + y

where : Type is optional. Thus the only way to declare a result type is the first and third forms, and the 3rd can't be inline as an expression.

Within top-level of { ... }, we don't need let because there can't be any function calls at the top-level within { ... }.

Note edit.

@shelby3 wrote:

Also ALLCAPS type parameter names is standardidiomatic for Java, Scala, Ceylon, Kotlin, etc afaik.

Edit: note all lowercase for type parameters would also be distinguishable, but it would conflict with idiomatic usage in Java, Scala, Ceylon, Kotlin, etc afaik. Also it would make the argument list more difficult to parse visually as the arguments are also likely written in all lowercase, e.g. let f(x: a, y: b)....

@keean isn't a Product (aka tuple) type also implicitly a Record type, since its members can be accessed also by some default positional names?

For example, Scala uses _0, _1, etc. Better 0-indexed so compatible with array indices (since we may be using arrays to implement tuples in JavaScript).

keean commented

@shelby3 I think Scala is odd in this, I would normally expect to pattern match a type like this:

let (x, y, z) = somethingThatReturnsATriple()

normally deconstructing structural types like a tuple a allowed anywhere a normal variable can get assigned.

@keean I agree with your example usage, but maybe there are times when the positional names will be useful?

@keean I was re-reading your idea to unify the syntax for data, trait (interface or typeclass), and module.

Realizing that typeclasses are just an interface that is either monomorphized or an implicitly passed object instance of the interface, caused me to see that there is nothing that special about typeclasses nor an instance of a class object which implements an interface.

Sum types are a bit weird for an interface or typeclass though. It means the type has to be tagged and the code that accesses the interface has to pattern match on each of the sum cases. But at least it unifies and perhaps there is even some utility to sum type interfaces.

I don't see how they unify with module, unless access restrictions become part of interfaces. Need to think more about that.

Thus data is not the correct unification keyword since typeclass interfaces can be monomorphized (i.e. no data object is ever created).

I don't like struct either because it has a meaning in C which I would like reserve for a binary packed data structure. I don't think most mainstream programmers (at least not autodidact programmers who didn't study computer science at a university) are aware of use of struct for example in ML.

I like interface or trait. My complaint against trait in past has been that Scala conflates them with mixins containing variant implementation. ECMAScript has mixins now.

And for the typeclass instances of interface or trait, I like variant or reified better than instance because the latter implies to me an object.

keean commented

It's not so much a sum type, as a record type. A record is like a JavaScript object, it's a collection of labelled properties. You can also think of a record as a "labelled product type".

You are correct, a type-class is really just a mechanism for resolving interfaces implicitly. Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).

If we allow polymorphic-recursion then type-classes can be non-monomorphisable, and we have to pass the correct instance at runtime, depending on the type at runtime - this is something I would rather not do, and would suggest parsing as an argument instead.

Regarding modules, they are no different than records/objects, except the can in some languages extend them with additional features. Our records would already be parametrically polymorphic, which allows them to be instantiated with type parameters. If our language supports no other module features, then modules and records are already unified. You can see this in JavaScript where function closures returning an object give you simple modules. In this case as module implementations can be passed as first class values and we avoid the need for a separate module type system like ML has (and we don't want subclassing of modules).

The only other module features we probably want are associated types (or type families), but we want type-classes to have these too, so you could say a record is a module with no associated types.

There is one thing to be careful of here, and that is passing modules by value can lead to typed depending on values (dependent types). I think it is a bad idea to have a little bit of type dependency, other languages explore dependent types, but here they add unexpected complexity. However I think we can use union types here, to avoid dependent types and also avoid weird restrictions on passing modules that will confuse people.

If we consider a module with an associated type, say some container that contains only Ints or a different container that contains only Strings, when we pass a container like this and access the values they will be of type (Int /\ String). This is different from the case with parametric polymorphism because the type in the parametric case is in the signature and we can monomorphise the called function into two versions, one which handles strings and one which handles ints, and know which one to call at compile time. So why not use parametric polymorphism everywhere? Well type parameters are 'input' types and therefore need to be specified or inferred. Associated types are 'output' type parameters, and simply having a specific value of a module determines those types. Further if the output types of a module are functionally dependent on the input types, we can monomorphise the output types wherever we can monomorphise the input types.

It's not so much a sum type, as a record type. A record is like a JavaScript object, it's a collection of labelled properties. You can also think of a record as a "labelled product type".

I know what a record type is, you don't need to waste the verbiage (perhaps you remember me writing numerous times in our discussions about record types). You didn't understand my point. I meant what I wrote. I meant a sum type. I presume you remember what a sum type is. If we unify data and interface/trait, then the sum types of data are unified and exist in the latter also. Please re-read my prior post in that light.

Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).

I like that summary (in the context of what I had written). We should make sure we frame that and use it in our documentation.

If we allow polymorphic-recursion then type-classes can be non-monomorphisable, and we have to pass the correct instance at runtime, depending on the type at runtime - this is something I would rather not do, and would suggest parsing as an argument instead.

I am not grokking this (I don't know the meaning of the term polymorphic-recursion, although I can guess), especially don't know what you mean in this context by "parsing as an argument" so I have insufficient foundation from which I can easily deduce the meaning of the rest.

I do remember we had discussed higher-ranked typing and the issues of functions which require typeclasses being the argument of another function. Is that what you are referring to? I remember you wanted to force us to create a new record instance for each case and have to do boilerplate to integrate two instances which are otherwise equivalent other than their differing record name. I remember that was a complexity aspect that started to make me want to not attempt to add typeclasses at the first iteration of the language. Too much complexity needs to be carefully considered first.