keean/zenscript

Concurrency

Opened this issue · 750 comments

I have a long comment at the Rust forum (some of the inspiration came from @keean linking me to Sean Parent's video presentation) and another one that I posted to another forum, which delve into the various options for concurrency and comparing them. Note the negative aspects of stackless coroutines enumerated in the second linked comment post.

Also note that my code that employs a ES6 generator to implement an emulation of async / await has never been tested, but I did update and correct it recently.

@spion wrote a blog on why ES6 generators offer more flexibility than async / await.

Continuation-passing-style (CPS) can be employed for stackless coroutines which lower overhead (which for example PureScript supports with a continuation monad), but they are very restrictive and don't interopt well with the imperative programming that we do with JavaScript.

Concurrency is hard and we are not aiming for maximum performance of context-switches, thus I think stackless coroutines are a pita and not worth it. For someone who needs that control, let them use another language (e.g. PureScript or C++) and interopt through our FFI or JavaScript.

I need generators to support custom loaders, my emulation of async / await, and in general to interopt well with JavaScript ecosystem such as Node.JS.

Thus I think we should support ES6 generators initially. And we can look into coroutines later.

Thoughts?

keean commented

I think generators and co-routines are the way to go. Everything should be eager unless specifically lazy. That one was easy :-)

So are you okay with not doing anything in the language for stackless co-routines? The stackless form can implemented with a continuation monad library combined with declared purity?

Edit: I have hence realized that monads are employed to enable data race safe sharing of mutable state emulated by copying immutable data, and has nothing to do with the choice of the underlying mechanism of the usermode (i.e. not kernel context-switching) cooperative multitasking. Monads enable controlling stateful effects within a pure, referentially transparent, immutable paradigm.

keean commented

Stackless co-routines are cool. Its basically the same as JavaScript's event model. If we are compiling to JavaScript we have to cope with this. If we write an infinite loop, JavaScript requires returning to the event loop to process IO.

Is it best to give programmers a model where they write a linear synchronous program, and its is converted to the event model underneath and runs on a stackless co-routine runtime (transpiling to 'Go' would be interesting), or is it better to give them an asynchronous event based model?

@keean wrote:

Stackless co-routines are cool.

But they have compositional (and perhaps other) trade-offs. They seem to force either low-level tsuris, or monadic tsuris. Remember in general that monads don't compose (Applicatives do compose).

Stackless co-routines are cool. Its basically the same as JavaScript's event model.

I believe that may be incorrect. JavaScript is using afaik stackful event loops (as in multiple event loops running concurrently, one each for each JavaScript instance). Afaik JavaScript is not using a jump in CPS style, rather the script must exit or control is passed to the event loop with generators and returning a Promise.

Edit: I have hence realized that JavaScript’s callback model unwinds the stack storing the callback context on the heap to return to the event queue scheduler. So there is only ever one stack in use.

If we are compiling to JavaScript we have to cope with this. If we write an infinite loop, JavaScript requires returning to the event loop to process IO.

This is accomplished with generators and Promises (or setTimeout <--- notice camel case), which afaik are stackful.

Is it best to give programmers a model where they write a linear synchronous program, and its is converted to the event model underneath and runs on a stackless co-routine runtime (transpiling to 'Go' would be interesting), or is it better to give them an asynchronous event based model?

Afair, you posed that thought to me that before on the Rust forum. Can you find my response? I find it arduous to locate details of our past discussions on Rust's forum.

Generally I have disliked the design choices of almost everything about Go's design and I think I had already explained why I didn't like Go's concurrency model, but I forget by now.

keean commented

What do you think about the actor model?

I remember you explained some aspects of Go and especially the concurrency model are applicable to a some use cases. And I recall I didn't (at that time) find those use cases compelling for the programming I need to do right now.

I am hoping (pie-in-the-sky right now trying to resolve my health ailment) to build a mobile app platform (headed towards abstracting away the operating system to a large degree); also I need to do a lot of coding for JavaScript for client and server reusing the same code base on both (i.e. Node.JS), so I need high-level abstractions, modularity, extensibility, and composition. If I were doing systems programming, I'd have different priorities.

When I need to drop down to systems level tsuris, I'll probably be hoping for something better than C++, but seems Rust may have blown that chance with the mandatory lifetimes (or maybe that is exactly the feature needed if people will use it). I might have a different perspective on Go for that use case. We’ll see.

As for Actor model, that is more than I want to tackle today. I'll come back to it.

I'm a big fan of the actor model. Just as an aside.

Anyway, I think CPS with tail optimization basically gives you the best possibilities, combined with syntax that compiles into a stack machine (ES6 generators work like this and async/await is just a particular generator).

I really really love JavaScript's concurrency model with setTimeout. Basically, in JavaScript doesn't use coroutines, it uses a polling based event loop. A game loop in JavaScript is written like this:

function loop() {
  console.log("ping!");
  setTimeout(loop, 0);
}

loop();

This means: run loop, then yield to the event loop and ask it to rerun loop as soon as possible. During the execution of the loop, no other code can run (not even UI events, there's no interrupt events in JavaScript). So when you write an infinite loop, you have to write a recursive loop that yields to event loop with setTimeout instead. Notice that giving it a timing of 0 means basically just infinite loop (0 means loop as fast as possible), but after each iteration, it allows the UI logic to update.

I personally think this is THE best system I've ever seen in a language. It makes sure that you get concurrency, without state issues. For lots of concurrency use cases, you don't actually need threading and JavaScript's model allows you to do it very elegantly.

So yeah, sticking to that same model seems logical to me.

keean commented

We could have a first class actor model:

actor Queue :
    list : Nil
    add : (activity) =>
        list.push(activity)
    remove : (activity) =>
    // etc

The idea is the actor can have state but all the 'methods' must be pure, and we model calls as asynchronous message passing (which can be optimised to a normal function call.

I'm not sure if I agree with syntax, but yeah, first class actors would be extremely interesting to have. I think actors also combine very elegantly with generators and iterators for a VERY powerful concurrency model.

Combined with a yield to event loop syntax, this would make ZenScript close to the most powerful concurrent language on the market.

I just wonder if we can somehow marry records and actors under the same system, so all actors are records or all records are actors. I'd also like to get delegation into that system. I'm not very familiar yet with the current ideas on records, so I'll have to read up on that and come back to you.

keean commented

Records are data, so would be the messages sent between actors, or in the internal state of actors etc. So actors are not records, actors receive and respond to messages (they act), records are just plain data (although in theory a message could contain an actor address, or even a serialised actor). So we end up with something like actors can contain records, and records can contain actors, but actors are not records.

I'd argue you can view a record as an actor that responds to queries (messages) for data in a specific format, but I get your point. We can always look into this issue later, it's not very critical. It'd also devolve into a discussion about non-deterministic arbiters and such, which is also something I don't want to get into.

In summary: first-class actors that are encouraged and influence library design = good. I envision libraries that offer lots of functionality through monadic abstractions over actors in a way that makes it look like any other language and I smile 😄

@SimonMeskens wrote:

During the execution of the loop, no other code can run (not even UI events, there's no interrupt events in JavaScript). So when you write an infinite loop, you have to write a recursive loop that yields to event loop with setTimeout instead […]

I personally think this is THE best system I've ever seen in a language. It makes sure that you get concurrency, without state issues.

There’s still the potential for data race unsafety for the mutable data shared between your loop and any other event handlers that run every time your loop does a setTimeout and returns to the runtime’s event dispatching scheduler.

And the callback you provide to setTimeout has to save the state and restart location of your loop’s progress. It’s a lot of cruft compared to Go’s green threads (aka gorountines) which enable what appears to be sequential synchronous code.


Regarding my comment post about Actors (← click for link to my definition of unbounded nondeterminism), I found the following interesting:

https://youtu.be/nXYGPYnCokE?t=541

He makes a very interesting point that aspects of models which do not preserve bijective ("bidirectional") structural equivalence, are thus introducing the aspect of thermodynamics that processes are irreversible. What he really should say is such a model doesn't require a total order and allows partial orders, which is entirely necessary for modeling asynchrony. A total order is bounded, whereas the set of partial orders is unbounded.

https://youtu.be/nXYGPYnCokE?t=1460

The statement that "forgetting is the essence of thermodynamics consequences" is from the perspective of the any relative observer inside the system, but that doesn't mean the information no longer exists from the perspective of a total observer outside the system. The existence of a total observer can't be required in the system without bounded nondeterminism. So unbounded nondeterminism requires forsaking omniscience. As I had mentioned in the past, universal omniscience requires a non-quantifiable speed-of-light (so that any observer can be total in real-time), which would collapse the light cones of special relativity rendering the past and future indistinguishable and nothing distinguishable could exist.

More on comparing Actor model to π and rho calculi:

https://youtu.be/oU4piSiYkE8?t=488

The Actor model doesn't have an explicit algebra of channels (channels apparently require additional overhead because expose a race condition of the concurrent processes which can access the channels), although channels can be created in the Actor model by creating a new child Actor for each channel, where the "child" relationship is maintained in the state internal to the involved actors in terms of the child actor forwards all received messages to the parent actor. Thus apparently the π and rho calculi algebraically model this particular configuration of the Actor model. So the distinction is what can be analyzed algebraically in the model and not an enhancement of computational power of the model. An analogy is I guess the NAND or NOR gates from which all digital logic can be constructed, yet we don't program in such a low-level model. So the Actor model is fundamental but more low-level than the π and rho calculi.

Please check my logic, facts, and analysis in this comment post.

Go's green threads (much more lightweight than OS threads) are both more and less performant than JavaScript's callbacks (i.e. Promises) depending on the usage scenario.

For example, when a stack of nested functions is waiting on the result of the innermost function, Go's very low cost context switch which retains the stack is more performant than unwinding the stack and essentially copying it into a nesting of Promises for accomplishing returning from each nested function to the top-level JavaScript event loop.

Vice versa, when parallelizing independent code paths¹ (irrespective to scheduling algorithm) to make multiple blocking operations concurrent, Go needs a goroutine and thus an allocated stack for each concurrent operation; whereas, JavaScript only needs to allocate a Promise for each operation.

An example of this latter case is invoking map for a range (e.g. of an array) with a blocking function as the input, such that we want the invocation of the blocking function for each element of the range to run concurrently (but not necessarily executing simultaneously parallelized depending on whether hardware threads scheduling are employed). So for JavaScript, the code would invoke the blocking function on each element saving the returned Promises in an array and assigning continuation code to the then method to save the result for the output of the parallelized map operation; whereas, the Go code would spawn a goroutine with the continuation code for each invocation. The continuation code of both would decrement a counter to track when the parallelized operation was complete. The JavaScript completion code would fire the callback in the Promise returned from the parallelized operation; whereas, for Go the goroutine owning the parallelized operation would sleep awaiting the 0 value of the counter. When more than one hardware thread is employed by the parallelized instances, then the counter decrement operation must be a mutex (which could alternatively to shared write access, be effectively achieved by accumulating all decrements as messages to a single thread which performs the decrement, which also requires critical section synchronization for the updating the message/channel queue).

Hardware thread scheduling with JavaScript is afaik currently non-portable (e.g. no Web Workers on Node.js) and less flexible (e.g. shared objects are forced to be thread safe by copying or borrowing[Edit: there’s SharedArrayBuffer but as such it’s of limited functionality because SharedArrayBuffer doesn’t integrate with GC, Object, exceptions, nor yield generators]). Web Workers implementations are not forced to implement M:N green threading. So optimization of hardware utilization is less flexible on JavaScript. I conjecture that it is hypothetically plausible to combine both Promises and green threading in some optimized way but not currently in possible in JavaScript.


¹ Rob Pike states an incorrect definition of concurrency in the general case because in general there is no way to prove that nondeterministic composition of code paths are independent because this would require a total order which is the antithesis of unbounded nondeterminism, a.k.a. in the indeterminism case (where the unbounded entropy of the universe is intertwined in the state machine of the program), i.e. it can't be proven that the state interactions in concurrency are faultless or fault tolerant.

Sean Parent provides a different definition. John Harrop’s definitions of concurrency and parallelism seem to jibe with mine.

And Pike's error is also a weakness of Go's CSP (communicating sequential processes, i.e. channels) in that without bounded nondeterminism then it is impossible to guarantee the channel communication won't deadlock as I wrote before:

as contrasted with Go's CSP that attempts to more tightly synchronize and bounded determinism

Even though concurrency in general can potentially deadlock on any dependent state:

An Actor employs futures (aka Promises) on result values to create a decentralized stack which is claimed to prevent deadlocks on mutual recursion but clearly it won't prevent deadlocks on locking a resource so resources must not be locked while waiting for response to a message. Not to be confused with asynchronous sending of messages, which means not blocking waiting for receiver to acknowledge receipt, which is one of the differences from Go's CSP.

Go is correct that the first use case example from my prior comment can be always be handled by the language and appear to the programmer at the source code level to be a synchronous function call. That assumes the programmer intends to not execute any of his source code concurrently while waiting for the result of the said blocking function. The programmer could convey this intent by assigning or casting the result value to the value type of the Promise for languages employing Promises and by not passing[waiting on?] a channel to the blocking function in a language which employs channels such as Go. In that case, the compiler could infer that it should arrange the necessary concurrency (implemented either via event loop model or green threads).

However, the only reason the programmer needs access to Promises or channels for this usage scenario is to manually encode parallelism inherent in the algorithm (i.e. independent code paths)— but which the compiler could in theory infer if it is informed as to which functions are pure and which objects are read-only, borrowed, etc. Given this latter information, the compiler could even infer the encoding of the inherent parallelism in the second use case example from my prior comment. Even the (presumably recursive algorithm) of the loccount tree traversal parallelism in Eric Raymond's Go blog, could be hypothetically inferred by the compiler.

So I come back to my interpretation of the generative essence of Sean Parent's thesis, which is that the more completely and accurately we can express our algorithms, then less fragile optimization kludges will creep into our code. In other words, the programmer's job is to convey the algorithm and the compiler's (and runtime's) job is to optimize it. In other words, if the compiler can infer the parallelism, then it is free to optimize the number of threads dynamically (and even perhaps to choose between an event loop or green threads on a case-by-case basis). Compilers and runtimes thus become smarter and programming becomes simpler with more algorithmic clarity (less implementation details and premature optimization clutter).

Thus I am asserting that both JavaScript and Go have some catching up to do with Haskell which I've read it already quite good at optimizing and inferring parallelism and other optimization expressed in the algorithm. And Haskell is too obtuse and as I explain below that 100% purity is too restrictive (as we discussed in the thread on purity it forces purity everywhere as total order, requiring monads to model effects as types yet monads aren't generally composable). There is hopefully a better balance that could be achieved. A marriage of pure, declared side-effects (effects as types?), and read-only declarations with an imperative language and without all the category theory when it isn't necessary.

Sometimes it is necessary to borrow the lifetime, because just declaring an object read-only is insufficient to convey the algorithm. For example, for inferring the parallelism if we know the input tree object to the loccount example is borrowed, then we don't care if the per-file counting function is pure, because we know it can't modify the input tree object if we aren't providing the said function a borrowed reference to the tree. But we only need to borrow it as read-only, so it is not an exclusive borrow except exclusive of writing. However, as @keean and I remember from our analysis of Rust, borrowing is a global total order exclusion paradigm. Whereas, if we know that the per-file counting function is exclusive of the input tree object (i.e. in a partial order of exclusion), then we don't need the paralysis and tsuris of a total order on borrowing. The only way I can think of to achieve that partial order is if the said per-file counting function is pure or we make a read-only copy of the input tree object. Note a pure per-file counting function could read from the files without making itself impure. A function which writes to file can also be "pure" (aka referentially transparent) w.r.t. to any context which doesn't read from the file, because it doesn't call any function which modifies any state of the program, i.e. that some external context can read from the modified file is irrelevant if the compiler is only analyzing inherent parallelism for a limited context which doesn't read from the file. So it seems perhaps we need not only pure function declarations but also declarations of specifically which side-effects an impure function creates.

Another advantage (in addition to the aforementioned simplicity, clarity, and antifragile flexibility of the source code level expression of the algorithm) of having the compiler infer the parallelism versus the programmer manually encoding the parallelism, is it won't infer it incorrectly due to human error thus creating a bug. One of the main goals of compilers is to do the tedious and complex checking for correctness that humans don't naturally do perfectly. We need humans for the creativity of algorithms, not the pedantic repetitive mundane tasks and premature optimization.

If I understand correctly the stance of those (e.g. @keean in private communications) who would argue against the viability of my prior comment, they argue that such optimization is equivalent to a proof search which is an exponentially hard complexity problem, e.g. optimizing a bubble sort to a parallel merge sort. But afaics, I did not propose allowing the compiler to change the algorithm. I merely proposed for the programmer to express the invariants necessary to infer the parallelism inherent in the algorithm expressed. I've read that one of the Haskell compilers is already apparently very good at this, which is enabled by its strictness w.r.t. such invariants such as referential transparency and side-effects lifted to types (monads).

I agree that compilers which are expected to modify algorithms (e.g. from a bubble sort to a parallel merge sort) would be equivalent to an exponential search which is probably in one of the very intractable computational complexity classes. I am not yet clear if my proposal is also in an intractable computational class.

The argument that claims all (or nearly all) of the interesting parallelizable code can be put in libraries carefully hand-tuned by experts seems to potentially leave a lot of userland code stuck with the noise hiding the underlying algorithm with all the premature optimization of not adopting my proposal. Also it means that expertly written library code becomes that much less readable in open source. I don't like the Cathedral model of s/w development. I prefer open source, readability, simplicity, clarity, and not premature optimization by a small group of experts who can disappear or totally muck it up and no one knows how to fix it.

Also the point of having the compiler detect the parallelism (i.e. the independent code paths) is that it needs to be able to do that anyway in order to check that the parallelism that the concurrency that the programmer would otherwise manually encode would not violate the (lack of) parallelism (i.e. required data race safety) in the algorithm (c.f. Rob Pike's misunderstanding of concurrency versus parallelism). Thus in order to not involve human error and bugs, we need the express the invariants to the compiler so the compiler knows which code paths are independent. Without invariants, the compiler can only safely assume that none of the code paths are independent and thus there is no parallelizability in the code. The reason concurrency programming is so difficult (i.e. buggy) is because humans can't reason perfectly about all the invariants.

Rust's strategy is to force us to have a total order of (provably safe from deadlocks and livelocks) compile-time “locks” (borrows) on all resources in the program including all allocated objects. But it seems to be uneconomic to force such pedantic tsuris of synchronization on all code, when the highly parallelizable code may only be a small portion of the program. Instead I have proposed that we only declare the invariants we need to, for the cases where we need parallelization.

keean commented

Parallelising an algorithm is equivalent to changing the algorithm. Modern super-scalar out-of-order CPUs already exploit the maximum inherant parallelism in code, and they do so at runtime, where there is more information about scheduling and completion times than at compile time. A compiler cannot beat the CPU at this kind of optimisation, hence the failure of VLIW CPUs in general computing. Again writing the Titanium compiler to achieve the same results at compile time as an x86-64 achieves as runtime turned out to be a hard problem.

optimizing a bubble sort to a parallel merge sort

That is parallelizing the task of sorting, not inferring the inherent parallelization in the original bubble sort algorithm (if any).

Parallelising an algorithm is equivalent to changing the algorithm.

In the example code below i and j can both be inherently read in parallel, as long as the compiler knows that the two reads are independent from each other. The algorithm is not changed by running those two reads concurrently. The invariants (which if declared, insure the two reads are independent enabling the inherent parallelization) are part of the expression of the algorithm.

var i = ... // read string from a file
var j = ... // read string from the Internet

return i.concat(j)

Modern super-scalar out-of-order CPUs already exploit the maximum inherant parallelism in code

They can't optimize the inherent parallelizability in the example above, because they don't know that the two reads are independent. Those invariants aren't expressed at the low level of machine code.

You are conflating low and high-level parallelism.

keean commented

I see what you are talking about. There is no guarantee that the file and the network are not connected. Reading the file may change the result returned from the network. Such optimisations are not safe in general.

The programmer needs to express whether the IO operations need to be sequential or parallel. In the above example they are inherantly sequential. Parallelising may break the programmers assumptions.

There needs to be a way for the programmer to say it is safe to parallelise. Something like:

(I, j) = parallel(read file..., read network)

There is no guarantee that the file and the network are not connected. Reading the file may change the result returned from the network.

If the reading is declared pure, then it can't have side-effects. This should obviously be enforced.

Such optimisations are not safe in general.

I am not proposing in general. I am proposing only when the invariants insure the independence of the code paths.

The programmer needs to express whether the IO operations need to be sequential or parallel.

Disagree. Seems you are missing the entire justification for (thrust of) my proposal. Now you are trusting the programmer to know all the invariants of every possible ramification in the universe, i.e. back to JavaScript and Go's model. And manually encoding the concurrency details instead of the compiler taking care of that detailed tsuris and noise. This causes bugs and makes code less comprehensible (obscures the algorithm). If instead we build up invariants via hierarchies, the same as we do for coding in general (i.e. we call functions which call other functions, i.e. a purity declaration is transitive to functions we as the programmer aren't even aware of), then the compiler does that reasoning for us. We generally don't want programmers to have to do non-local reasoning, as that is not modularity.

Again I am not yet sure if my proposal is computational complexity tractable. But it is also not necessary that the compiler infer every parallelization opportunity. The programmer can still fallback to manual encoding when the compiler fails to infer. It can be a compiler technology that improves over time.

I want to make some major innovations in programming language, if possible.

keean commented

The system does not know the invariants. The file that you read could trigger a network service to change the data downloaded from the network, such that changing the order from sequential to parallel will change the results output by the program. In general the programmer is the only thing with knowledge of the invariants, unless they are somehow expressed in the code.

The system does not know the invariants.

We don't refute a potentially correct proposal by refuting one example where the invariants could not be insured (thus could not be declared and wouldn't be inferred). That is disingenuous (noisy) discussion to not admit there are examples that can be insured.

It does in some other examples. I had alluded to an example in one of my prior comments, where the compiler knows from the declared invariants that reading a file doesn't modify any objects (stored in RAM) in the code paths.

The file that you read could trigger a network service to change the data downloaded from the network

Not if the invariants insure that the network resource does not dependent on any such network services. My example presumed the invariants were insured.

Tangentially, you are making the case for why in general concurrency is indeterminant and there are in general no independent code paths and thus in general concurrency is always buggy (and can't be fixed), because the entropy of the universe is unbounded.

But my proposal is about limiting inference of inherent parallelism to insured invariants.

In general the programmer is the only thing with knowledge of the invariants

If the invariants aren't insurable, the programmer manually writing the concurrent structures isn't going to make it any less buggy. You've gained nothing except all the negative aspects (I've already enumerated) of not inferring.

If the compiler can’t check the data race safety of the parallelism the programmer explicitly requests, then the same bugs can happen. I guess your point is that it’s better to at least supplement what the compiler can check with the programmer’s explicit intent to have a better chance of avoiding data race safety bugs for that which the compiler can’t check.

Edit: you can make the argument that the programmer can declare that his algorithm doesn't care if the result of j is nondeterministic w.r.t. to i and the unbounded entropy of the read operations. Thus your manual encoding becomes useful in that case where the invariants required for deterministic parallelism can't be insured. I wasn't arguing against still allowing that manual encoding when necessary (to express unbounded nondeterminism or to handle deterministic cases the compiler isn't smart enough to infer).

keean commented

I think there are three scenarios, these must be sequential, these must be parallel, and "don't care". The problem with the above example is it is ambiguous between "must be sequential" and "don't care". Programmers used to writing sequential code will assume sequentiality. We need a way to distinguish don't care, and sequential.

We both know this is not a problem for pure code, as beta reduction order is non-deterministic, but we cannot do IO from pure code (exactly because it makes evaluation order visible to external IO).

keean commented

In general IO assumed sequentiality, consider:

print("Shall we play a game?")
let game = getString()

In general IO assumed sequentiality, consider:

print("Shall we play a game?")
let game = getString()

You are writing and reading from stdout and stdin, thus of course those invariants tell the compiler there is no inherent parallelism. I don't see any inconsistency with my proposal and your example.

If instead you were copying that UI string to a dialog box control and then reading from an input field of that dialog, then you'd have inherent sequential code path expressed by the dependencies on the access to the dialog object resource.

The problem with the above example is it is ambiguous between "must be sequential" and "don't care".

There is no ambiguity because if there are no dependencies, then it doesn't matter which order i and j are instantiated.

I understand what you are thinking is the programmer may have some semantics in mind which are not expressed unambiguously in the code itself. Can you think of an example?

keean commented

It only doesn't matter because you assume the file and network operation are independent. Let's assume they are dependent, so reading the file changes the result from the network service. How do I express that the order matters, even though the operations are on different 'channels'?

keean commented

Here's a more practical example, first write a file to a USB stick, and then display a message that it is safe to eject the USB device. If you arbitrarily parallelise IO then the message will appear whilst the file is still writing, and you will remove the device resulting in a corrupted data file.

Here's a more practical example, first write a file to a USB stick, and then display a message that it is safe to eject the USB device.

Correct programming code would not display that message until testing that the return value from file write operation did not return an error.

You could argue that correct code would issue an exception on error (instead of a return value) which would bypass the code that follows the file write call. Exceptions thus make it ambiguous whether the code following the file write call is sequentially dependent on the successful execution of the write or not. This is a problem of expression due to void functions. And this would need to be dealt with somehow.

Perhaps the most general construct would be to allow void functions to be assigned to an object of Void type. Dependent code paths could condition on if( object exists ). Or probably better is just assign the function to an object of type Promise (or infer the type desired by call .then() on the function return value), and attach the dependent code using the then() method.

Thus semantically dependent sequential code paths become explicitly more verbose (where they would not have been) so that independent code paths can be inferred.

It only doesn't matter because you assume the file and network operation are independent.

I think it is correct that in general we can't ever insure independence between IO external to the memory controlled by the program.

Just because it was a not carefully chosen example of my proposal, doesn't mean my proposal is broken.

But for example if the program has opened two files with exclusive read/write access, then can insure that no writes will occur that the program doesn't know about. If the program has opened the network resource with exclusive access, it can be sure the network resource will not be written to while it holds that exclusive access. So in that way my example was plausible.

How do I express that the order matters, even though the operations are on different 'channels'?

We probably want to assume order always matters between IO external to the memory controlled by the program, except where we can insure exclusive access of the IO resources. So your question should I think instead be how do we express that order doesn't matter for distinct channels? In other words, how do we declare that a pair of channels are known to not interact.

Perhaps we shouldn't even support such a declaration since it seems so fragile of an insurance. I think in such cases the programmer can just manually encode the concurrency for the presumed parallelism. However as I stated, the exclusive access case is valid case, but those invariants are not on the function called but on the holistic context. Need to think about how such invariants could be expressed.

keean commented

Just because it was a poor example of my proposal, doesn't mean my proposal is broken.

I agree, I like the JavaScript approach, combined with Promises, but the stack cost it high. By introducing co-routines/co-functions, the stack cost can be reduced to be similar to Go's green threads, and can be based on an N:M concurrency model, and promises are not needed. The JavaScript implementation would by necessity be single threaded, but it should not be limited to only JavaScript as a backend, and of course we can use 'WebWorkers'.

I am thinking my idea for parallelism can for the inferred cases increase clarity of source code and provide compiler verification of correctness. Those seem like major wins. But I don't know yet how tractable the inference might be.

And hypothetically, the compiler could optimize what it is employing for scheduling tasks, e.g. either green threads or Promises depending on each use case, for I had explained that Promises are more efficient in some use cases, i.e. I think we need both in the same program.

keean commented

A cofunction models a promise, it behaves like a blocking API, and does not unwind the stack back to the main event loop every time. Unwinding is an imementation detail - it should be possible to implement cofunctions on top of promises and green threads.

I agree, I like the JavaScript approach, combined with Promises, but the stack cost it high. By introducing co-routines/co-functions, the stack cost can be reduced to be similar to Go's green threads, and can be based on an N:M concurrency model, and promises are not needed.

My proposal is about making the expression of the parallelism agnostic to task scheduling concepts employed to execute them concurrently. Parallelizable code then looks very similar to any other code, because it isn't littered with premature optimization of embedding a particular scheduling concept in the source code.

The key insight is the separation-of-concerns, i.e. that parallelism is due to independence invariants. The invariants are a separate concern from the non-parallel expression of the algorithm, because in many cases the algorithm doesn't care if it is executed in parallel or not (and the invariants inform us of any inherent parallelism that can be optionally exploited).

The compiler and any runtime will be free to optimize and experiment with different scheduling concepts without the programmer needing to change the source code.

I am proposing to think about the issue from a more high-level conceptualization.

Higher-level programming languages were much more productive. I think we need higher-level parallelism now also. It is the next step of the evolution.

keean commented

I don't think this approach can work in general, and even if it can it will lead to a complex compiler and very inconsistent runtimes (hidden magic).

My preference us to make the default for imperative code to be sequential execution, for pure functional code to be "don't care" and for there to be some special syntax for explicit parallelism. If co-functions result in not needing promises, because a promise chain looks like ordinary sequential imperative code, then what is needed is the equivalent of 'Promise.all', effectively a way to run an array of commands in parallel and wait for them all to finish. If normal tuples imply left-to-right evaluation (to preserve side effect order in function calls) then all we need is a 'special' tuple for parallel evaluation. Maybe [ ] could be used or perhaps a digraph like (| |) to indicate parallel evaluation. This could be applied stand alone or in any function application, so f(|x, y|) would evaluate x and y in parallel instead of left-to-right.

I don't think this approach can work in general

Because?

I specifically said the compiler didn't need to infer every case. The programmer can manually encode when the compiler is too dumb to (while retaining the original version for if the compiler ever gets smarter).

and even if it can it will lead to a complex compiler and very inconsistent runtimes (hidden magic).

Like Java's Hotspot. Bad improvement to have isn't it.

... what is needed is the equivalent of 'Promise.all' ...

Parallelism doesn't in general fit into one nice consistent concurrent structure.

keean commented

Hotspot is still slower than 'C', and has poor performance with parallel code (due to large memory requirements). Hotspot has not got "ever smarter" and this just had not happened in reality, despite the wishes of language designers. I want a language that works in the real world today, not some fantasy future that may never happen.

There are many complete models of parallelism that are simple like Communicating Sequential Processes. So parallelism can be fitted into one convenient structure. Many languages only provide 'fork' and 'join'. Above I provided a use case that breaks the optimisation assumption that you made - can you provide a use case where the simple model of parallelism I presented is not enough? We should always try to use the simplest model that provides what we need.

Above I provided a use case that breaks the optimisation assumption that you made

Sorry that is a mischaracterization of the discussion. Please that doesn't help us advance our work.

Hotspot is ...

Hotspot doesn't have all the invariants it needs to optimize high-level parallelism. I proposed declaring more invariants. I provided Hotspot as an example of how programmers don't have to do every optimization prematurely.

There are many complete models of parallelism that are simple like Communicating Sequential Processes.

You have a strange conceptualization of "simple". Hard-coding premature optimization of structure is not simple, because the entropy (i.e. surprise) of real systems (even programming projects) is not bounded. The model may be built from simple components, but its expression in code is far from simple.

keean commented

How can I put it differently: The invariants you want will surprise and confuse programmers used to the sequentiality of imperative code, and lead to a lot of bugs. Such assumptions are only safe for pure functions.

Hotspot is an example of how if you try and put too much complexity into the compiler it becomes a 20 year project for large organisations like Sun and Oracle. It is a white whale, and I don't want to be Ahab. I think the correct method for dealing with complexity and optimisation is to leave it to the library authors. If you make the low level tools available in the language then experts can use the language to write efficient tools for programmers to use as libraries. This keeps the compiler simple, and the performance predictable. I am sure you know that every optimisation for one case is a pessimisation for another. There are some sets of numbers for which a bubble sort (less than 10 numbers for example) is faster than quicksort. Optimisations tend to be domain-specific, that is the optimisations to write a fast matrix library look very different from the optimisations for a monte-carlo simulator, or a graphics library. This is why no high level language has succeeded in beating simple 'C'. The 'C' author is always able to exploit their knowledge of the problem domain to write code superior to that produced by applying optimisations in the compilation of high level code. I propose this as a general hypothesis - "No optimising compiler for a high level language can ever have optimal program performance in all problem domains" - so therefore this approach is doomed to fail for a general purpose language. In summary invariants can help compilers produce faster code, but it will never beat a human given precise semantics. What no language has done (that I can think of) is allow the experts to write precise code and the general users to write high level code in the same language. C/C++ is great for the low level detail, Haskell is great for high-level, what we need is a language that bridges the gap.

Regarding CSP, I mean where the model is simple (very few primitives) yet you can build any complex system using it. This is the very definition of expressive power.

The model may be built from simple components, but its expression in code is far from simple.

Yes this is exactly what we want. We do not want a complex sprawling mess of primitives, building a complex system on a complex model is much more prone to error than building a complex system on a simple model.

@shelby3 wrote:

Perhaps the most general construct would be to allow void functions to be assigned to an object of Void type. Dependent code paths could condition on if( object exists ). Or probably better is just assign the function to an object of type Promise (or infer the type desired by call .then() on the function return value), and attach the dependent code using the then() method.

I think I like better the ability to turn exceptions into Option types by assignment from try:

let result = try function_that_throws()
if (result != None ) // result is the thrown error
else // non-parallelizable (i.e. synchronous) code

Hotspot is an example of how if you try and put too much complexity into the compiler it becomes a 20 year project for large organisations like Sun and Oracle. It is a white whale, and I don't want to be Ahab. I think the correct method for dealing with complexity and optimisation is to leave it to the library authors.

While I agree decentralized solutions are more antifragile, I think we are doing too much handwaving and aren't specific enough w.r.t. to differences between what Hotspot is optimizing and what I am proposing.

This is why no high level language has succeeded in beating simple 'C'.

To beat C at which goal? So JavaScript and Python have no advantages over C for any use case?

Low-level coding has its tradeoffs and that is why it is not best to do premature optimization and instead profile the code, then rewrite only the critical performance portions in a low-level language.

Parallelization[concurrency] is not just the difference between faster and slower execution, and often also the difference between responsive and unresponsive due to allowing independent tasks to respond independently. The compiler + runtime could in theory optimize the tradeoffs (even dynamically at runtime). Why should every programmer have to reinvent such optimization if we can build it in standard?

We do not want a complex sprawling mess of primitives

Aren't you building a strawman alternative? Did I propose such as the alternative?

A proliferation of library variants for each premature optimization case is also a form of a complex sprawling "mess".

@keean you want to build systems inductively from small substructures up in source code to more complex superstructures. I am referring to the dual of building systems co-inductively from small superstructures in source code down to more complex substructures in compiled implementation.

I am not saying either is inherently better than the other. There are tradeoffs.

keean commented

I agree with the principle of what you are saying, and I am glad you mentioned 'antifragile'. I think a co-inductive approach would work too. For example I want to create a matrix library, and I do so using garbage collection and a simple sequential approach. I can come back and optimise memory usage and parallelise the library later. As such it is the abstraction of the interface for the library that is important, not necessarily any invariants inside the library.

If you have a great algorithm for parallelising code then maybe there is something worth looking at? Hand waving about magical compilers that parallelise code using unicorns is not going to get us anywhere :-)

Agreed that abstraction is the co-inductive approach.

If we can express an algorithm in its simplest abstraction and then prove that other variants of the algorithm are equivalent, then we have the analogy of a compile time type checking. I am only proposing to abstract the parallelism invariants of the algorithm and prove those are equivalent for variants of the algorithm. Even that is useful without an automated way to produce the variants, because can express our abstraction succinctly and then verify that all implementations of the abstraction are correct.

Then as you say, it is more useful if we can automate optimum variants of the abstracted invariants. I was thinking about the example of walking a tree graph and doing some work at each node and then collecting the results into a duplicate tree graph. I haven't written down this in code, but I suppose a recursion (hopefully tail recursive) approach is the most elegant expression of the algorithm. If the parallelism invariants are such that each node in the tree can be computed independently from the other, seems there should be a deterministic translation to a parallelized variant of that same algorithm. But I haven't actually tried to code this yet. Any thoughts?

f(node, output, callback) =>
   output.data = callback(node.data)
   if (node.left)
      output.left = new Node
      f(node.left, output.left, callback)
   if (node.right)
      output.right = new Node
      f(node.right, output.right, callback)

So the compiler can deduce that when only one of node.left or node.right are not null, then the function is tail recursive and make that optimization instead of manually encoding a while loop into the function.

Also the compiler can deduce that the two branches of the tree can be run in parallel if callback is pure and the reference to node is read-only (which the compiler can see is not written in the function). Strictly speaking we'd like to know we have an exclusive reference to output so it won't be written by external code while our function is running (such as if callback is a blocking function), but that is really an invariant at the discretion of the caller as it would be the case even if we didn't parallelize f.

keean commented

So traversing over data structures is generics by iterators and co-ordinate structures. A binary tree has a bifurcating iterator which looks like this: "left_successor, right_successor". Using this iterator we can build three traversal algorithms "in-order, preorder, postorder", and these can be generalised into a single higher order traversal algorithm where order is determined by a function parameter.

If we limit ourselves to linearisable data structures (so order of node access is irrelevant to the algorithm) we end up with a generalised 'fold' algorithm.

This all leads to two simple algorithms that we can use and parallelise, "map" and "reduce". (You may have heard Google talking about map/reduce architecture. "Reduce" is a fold and map is the standard map of functional programming.

So "map" takes a container and applies a function to every element independently resulting in a new container of the same shape. Reduce calculates some property uniformly over the whole of a container, although it can be sensitive to the traversal order, many reductions are not, however the classic "fold" is ordered, we would have to impose invariants on reduction to make it not depend on traversal order (reduction functions must be binary, commutative, associate for example), given that reducing '+' over a collection would sum the elements and could be parallelise into a binary tree of add operations.

As you can see 'map'able functions are about the only things that are trivial to parallelise, and even 'reductions' become difficult to enforce the invariants in a programming language, because we must prove a binary operator to be both associative and commutative. If we rely on the programmer to use it correctly then reduceing '-' over a collection is obviously going to produce random results (because the compiler will change the order depending on available resources)

As you can see 'map'able functions are about the only things that are trivial to parallelise

Not true. There are others that are trivial just from the compiler observing dependencies.

Perhaps what you mean is that most parallelism opportunities will not be trivial? If so, do you have any basis to think that?

I am only aiming to have the compiler handle trivial cases to start with. I am mostly aiming to eliminate (and have the compiler infer) that explicit repetitive yield and Promises + generators boilerplate which isn't even a case of parallelism (rather just concurrency and not blocking JavaScript's event loop). So minimally I am just wanting to give each function a type which indicates whether it can and should be executed asynchronously. Goroutines do this implicitly because blocking operations automatically cooperatively multitask across other goroutines, but an analog to Promise.all doesn’t exist.

even 'reductions' become difficult to enforce the invariants in a programming language, because we must prove a binary operator to be both associative and commutative

fold is implicitly sequential unless we prove those additional invariants. I don't see a problem.

keean commented

The compiler optimisation passes happen after type inference and before output generation. These passes usually consist of matching part of the AST and rewriting it into a new AST (tree rewriting). Effectively an optimisation pass gets the program AST as its input, and will have to do the necessary checks to ensure the semantic meaning of the program is not changed by the optimisation before doing the tree rewrite.

So it is easy to add these kind of passes to the compiler, the harder task is to understand under what conditions any given optimisation is valid, and write the code to check those conditions before rewriting the AST.

It would be great to see a sample tree-rewrite and associated checks for parallelising something like a "for(;;)" loop.

As I said, my initial goals are simply to abstract away trivial boilerplate which the compiler can detect because the function is returning a Promise and the value type of the Promise is being assigned. So no need to write yield and use generators or async/await. The compiler can insert that noisy boilerplate. Also I want the compiler to infer some simple cases of running two non-blocking operations in parallel. The more complex cases I can leave to better algorithmic thinkers than myself as a future improvement. I can always hand-code any other parallelism I need which the compiler can't yet automatically infer.

I am trying to abstract away dependence on Promises, so that for example my code is generalized to work with optimization of using green threads for more efficiency in those cases. And also so my code is less noisy. And trying to abstract away JavaScript's warts and annoying idiosyncrasies.

Also please be aware my immediately goals are anything I settle on for the time being must be something I can implement in a transpiler in about a month or so. I want to K.I.S.S. so I can use this in production code immediately if possible. May not be realistic, but I am taking a shot at it. But my patience to abandon it if it is taking too long, is very short.

keean commented

I don't see how you can abstract away the yields, they are doing something semantically different. Yield is effectively sending a response to the sender if the initial message and waiting for the next message in the protocol. If you abstract it away you totally change the semantics. Consider:

f(x) =>
   f(yield(x + 1))

If you call this as f(1) you get back (g, 2) where g is the continuation and 2 is the result. Yes you could inline the yield as you would inline a function with the usual time/space tradeoffs (and remembering that cache locality has a big performance impact so too much inlining and unrolling can be bad).

As you can see, at this point yield is about flow control, not parallelism, there is only one thread so far, and we always wait for operations to complete.

What is missing is a way to hand off a task to another thread or external hardware, and continue to do other things until the result is returned. We want this to look like a normal blocking call, for example:

let file = readFile("xyz")

Inside readFile we want to save a continuation and send a message to another thread. However sending a message can look like a normal function call which yields the result, except threads cannot ever be referentially transparent (because their state can be changed by yet other threads we cannot see).

If we think about low level primitives then continuations are one option (but probably too powerful, like goto). You also need a way to spawn threads, and send messages between threads. You also need to be aware of thread overhead, and that messaging threads takes 100s of clock cycles, so in many cases doing things sequentially is faster and more efficient. When running things in parallel threads you need a way to manage state update. Parallelising code by dependency analysis (which is like a dataflow language) is different from asynchronicity, where a background thread does some processing, like reading data from a disk, where one thread can pause whist other things run. This code needs to be re-entrant and capable of being run in different thread contexts without overwriting it's own state. There are really three different problems here that need to be handled in different ways. Pure code without state can be used in several contexts at once without problems. Dealing with state and multiple contexts is probably the hardest problem for a programmer.

You also need a way to spawn threads, and send messages between threads. You also need to be aware of thread overhead, and that messaging threads takes 100s of clock cycles, so in many cases doing things sequentially is faster and more efficient.

You must be referring to OS threads. But message between for example green threads could presumably be much more performant.

As I said, my initial goals are simply to abstract away trivial boilerplate which the compiler can detect because the function is returning a Promise and the value type of the Promise is being assigned. So no need to write yield and use generators or async/await.

That is what I wrote before.

f(x) =>
   f(yield(x + 1))

x + 1 isn't a Promise. I am referring to the case where I want to write:

let file: File = readFile("xyz")

Note if the type of file can be inferred from its use, then perhaps don't need to annotate the : File. Alternatively, I can have the compiler always assume we want to wait for the value of the Promise and instead force the programmer to annotate the type when he wants the Promise instead.

Instead of:

let file = yield readFile("xyz")

Or:

let file = await readFile("xyz")

The following I already stated:

Parallelising code by dependency analysis (which is like a dataflow language) is different from asynchronicity, where a background thread does some processing, like reading data from a disk, where one thread can pause whist other things run.

As:

which isn't even a case of parallelism (rather just concurrency and not blocking JavaScript's event loop)

Regarding the following, I already stated earlier in this thread when I provided the definition of parallelism from Robert Harper that in general concurrency (concurrent code paths) can't be proven to be independent and it is opening our program up to the (unbounded entropy) non-determinism of the environment thus the implication is our assumptions about our state needs to be such that out-of-order doesn't matter:

This code needs to be re-entrant and capable of being run in different thread contexts without overwriting it's own state...Dealing with state and multiple contexts is probably the hardest problem for a programmer.

keean commented

Sure, but we need both. We need the UI to remain responsive whilst a long calculation takes place in the background. In fact I would say as far as writing good software this is more important than parallelising loops, which is after all just a performance optimisation.

keean commented

Promises enable lazy computation to be embedded into an eager language. I would use a type constructor to create a promise as in:

let x = readFile("xyz")

Is probably best implemented with "readFile" as a promise, however the tricky thing is does the next statement wait for readFile to complete or not. Note often we want both things, to have independent statements follow it, but also have some statements that do depend on it completing.

let x = readFile("xyz")
... do something else
x.then((file) => 
   ... do something when x is read.
)

As you can see we often need to have statements that both do and do not depend on x finishing. This has to do with the type of the value returned from "readFile". In Haskell the value returned is "lazy" and that behaves as you want it to (when you try and read 'x' it blocks until 'x' is available, but other code not dependent on 'x' can still execute). Haskell is notorious for having difficult time/space complexity because of lazyness. Promises offer a solution to this, but where all types are still eager, and as such present a better solution than mixed Lazy/Eager evaluation with some kind of distinction between lazy and strict in the type system.

Sure, but we need both

What is your point? I stated the compiler will create the boilerplate to enable the asynchrony. Perhaps my explanation has not been so clear (lack of examples).

Note often we want both things

That is where the parallelism invariants inform the compiler which parts can be done in parallel. Any cases the compiler misses, can be optionally hard-coded (or improve the compiler).

That has been my point all along.

keean commented

In my experience having used both methods, lazy variables (with optional strictness annotations) and promises, promises are the better way to do this. Time/space performance of code is easier for the programmer to understand and manage with promises. I think Haskell's lazy by default approach was a mistake, and eager by default with lazyness annotations would make more sense. To give an example of where it starts to all go wrong, consider a loop that adds one to every element in some arbitrary collection. In Haskell the code output will replace each input value in the collection with a 'thunk' in the output collection, where a thunk is a suspended computation. So it is only when you try and read the value that it gets computed, or maybe yet another thunk is created. This results in the collection using a lot of memory, which slows things down, and also the memory copying to create the thinks is in this case more work than simply incrementing the integer in an eager way.

I have no idea how that relates to my proposal. In my proposal, the compiler can use Promises to implement that concurrency and even parallelism. The compiler might use Go-like greenthreads where it is more efficient.

keean commented

I am not talking about how the compiler implements things, and yes Go like green-threads would be a good idea for that. I am talking about the syntax and semantics exposed to the programmer. In general making everything 'lazy' where possible - which is what you are suggesting I think, will make performance worse in most cases. Certain things like stream processing will be improved, but nearly everything else is made worse. This is why in Haskell the programmer ends up having to add a lot of strictness annotations to recover the performance.

keean commented

I think this is interesting, and may help explain the limitations of the default lazy, optimise strictness approach of Haskell: https://wiki.haskell.org/Performance/Strictness

In general making everything 'lazy' where possible - which is what you are suggesting I think

I don't know why you think this. I proposed the compiler would automate the boilerplate w.r.t. to functions which return Promises. Those are already "thunked" by definition of what a Promise is.

I also proposed that some cases of successive instances of the above can run in parallel and the compiler will detect this based on invariants. I also proposed that some inherent parallelism could possibly be exploited by the compiler+runtime in the future, but by mentioning Hotspot, I intended that any such optimizing would be profiled at runtime and if sequential code is faster then it would be utilized.

I think your point may be that for any interesting non-trivial cases of parallelism will become too complex for the compiler+runtime to automatically fine tune. You may be correct about that, and I agree we need to also offer primitives the programmer can manually control. My goal is to automate what can be.

keean commented

Adding the instrumentation for runtime profiling will make it slower, and never able to achieve the performance of 'C'.

My problem with this approach is that the default is auto-magic according to rules to complex for the programmer to understand. As such the fine tuning is based on an uncertain starting point that may even change between compiler versions. This leads to very unpredictable performance. One program I write might be fast, and another slow, for no reason that I as the programmer could work out. Also a previously fast implementation may become slow in a future compiler version. As the magic happens in the compiler, it will be very hard to debug this slowness, as the source code does not actually contain any information about why it is slow (as performance is due to the interaction between the source code and the hidden optimisation rules in the compiler). I am not saying this approach is impossible, I am saying it leads to uncertainty, and frustration of the programmer using the language. It's one of the (if not the main) reasons I don't program as much in Haskell as I used to.

@keean wrote:

so therefore this approach is doomed to fail for a general purpose language. In summary invariants can help compilers produce faster code, but it will never beat a human given precise semantics. What no language has done (that I can think of) is allow the experts to write precise code and the general users to write high level code in the same language. C/C++ is great for the low level detail, Haskell is great for high-level, what we need is a language that bridges the gap.

I had already partially responded to this:

This is why no high level language has succeeded in beating simple 'C'.

To beat C at which goal? So JavaScript and Python have no advantages over C for any use case?

Low-level coding has its tradeoffs and that is why it is not best to do premature optimization and instead profile the code, then rewrite only the critical performance portions in a low-level language.

Also I wrote:

I am proposing to think about the issue from a more high-level conceptualization.

Higher-level programming languages were much more productive. I think we need higher-level parallelism now also. It is the next step of the evolution.

I didn't entirely address your point though. You are claiming that if we build a language that is low-level empowered, then expert programmers can create libraries and DSLs which the rest of the programmers in this language will use so that you imply the claim that most users will experience the simplicity of expression and productivity benefits of a high-level language coupled with the performance optimizations of a low-level language.

You provided the example of a map function which could be optimized by experts for the sequential and parallelized cases. But the user of the map would still have to decide when the employ each variant, which could involve significant complexity of coordinating dynamic computational load across the available number of cores and hyperthreads. If you respond that a library would be offered which handles this complexity, then that is analogous to what I proposed to have the compiler handle it.

To the extent that the parallelism can't be abstracted away, then the user is going to have to write code that is aware of explicit primitives. Whether this is done with a library or by the compiler probably just boils down to whether respectively some complex mechanism captured in types or elegant. Conceptually both paradigms are abstraction.

I think your point distills down to you want to build a language with a type and macro system that enables creation of such abstractions orthogonal to modifying the language. In other words, you want to create the inductive building blocks. But the problem is that designing such fundamentals such that they work elegantly and have all the necessary degrees-of-freedom is a multi-year trial-and-error architectural process. We expended several months last year and even then it is very likely that what you were coding would have been a dead-end that you would only have realized once you had already expended a year of effort. Scala is a good example of several years and many man-years thrown down a rat hole. What actually happens is complexity explodes, e.g. Scala. It is usually better to instead have a narrower, tightly focused goal that meets a real-world use case.

There is research being discussed over at Lambda-the-Ultimate (ongoing for the past years I've glanced over there) where guys have literally expended decades on various pet project rat holes. Creating the “perfect language” is at least exceedingly difficult and probably one of those delusions of nirvana that will never be attained.

Rather I'm thinking that incremental advances of the existing mainstream language ecosystems that are driven by a near-term real world use-case, are more likely to produce something that other programmers genuinely need than some multi-year pie-in-the-sky experimentation.

Although some of my frustration and limited patience (also limited cognitive energy) last year was due to being very ill with disseminated Tuberculosis (and not knowing what illness I had), it also true that I got frustrated (but expressed a mea culpa awareness) that we expended months and weren't able to home in on some priorities wherein we could deliver a production code capable advance within that time frame. It was as if we had no opportunity cost consideration. Nor were we realistically accessing our available manpower resources. I am not claiming it was entirely wasted effort. I am actually re-reading it all now and trying to distill it down.

I think we are coming at this from different priorities. Your goals are very ambitious in the first iteration you want to create the type system you've been thinking about for a long time (Prolog provable unification, typeclasses, etc) and you want the building blocks to more perfectly express Ivan Stefanov's body of work. Whereas, I wanted to code in JavaScript, but I wanted typing plus a few other improvements. I wanted to make incremental advances on the JavaScript ecosystem. So let's say over time my incremental mode of advance might be headed towards what you wanted if that turns out to be the optimum for rapid coding and open source, because my goal is not absolute maximum performance if it is costly in terms of complexity that makes code obtuse. So that is another reason I want to proceed incrementally rather than try to leap to some pie-in-the-sky which ends up being another Scala rat hole.

Also failing with incremental advance wastes less effort down a wrong direction. Of course, if we can discern in advance why a particular direction is not wise, then that is best.

@shelby3 wrote:

Lucid seems to best represent or summarize my goals for the language.

but I think now emphasizing clarity (fighting obtuseness and clutter) is the #​1 most important feature of a programming language in the open source era.

@shelby3 wrote:

optimum for rapid coding and open source, because my goal is not absolute maximum performance if it is costly in terms of complexity that makes code obtuse.

I think I have made it clear that performance is not the highest priority, although I don't want to ignore performance as Python does. Java is a good example of not being as performant as C, but being a much more important language in terms of comparing the number of serious projects entirely written in C or Java. Java is an immensely popular language. By your logic, GC should never be offered, because afair it requires double the RAM to lose only a 33% performance with GC.

@shelby3 wrote:

For me this new value is in the type system and type classes

Disagree. I believe it is the compositional unification of “everything is an expression” which I proposed (and you originally didn't like). See below. The typing is actually stepping into a hornet's nest that can quickly sink into the mud (see also) so I am wary (as I indicated before) and want to go slow on how much typing is added. I am also trying to innovate on declarative concurrency.

keean commented

What actually happens is complexity explodes, e.g. Scala. It is usually better to instead have a narrower, tightly focused goal that meets a real-world use case.

So this is the reason why you don't want an overly complex compiler. Keep the semantics simple, and let the programmer optimise where necessary.

Your goals are very ambitious in the first iteration

I don't think so, because I already have the type system mostly worked out. Progress has slowed because I have other projects that I need to work on, but these have involved learning Swift, and doing some typescript, so are helpful in understanding other approaches to these problems (and reminding me of why I wanted to solve these things in a certain way).

I already have the parser and type inference for the basic language of functions and operators working, along with code generation into JavaScript. If I can find time to add simple function import (assigning types to external JavaScript functions) you could already write simple functional programs.

Keep the semantics simple

Easier said than done, when you also need to keep degrees-of-freedom maximized.

So this is the reason why you don't want an overly complex compiler.

The word ‘overly’ can be very subjective. Why even have a compiler then. Why have higher-level languages. Let's code in binary machine code, which is the maximal degree-of-freedom semantics.

You probably have grossly underestimated (via the confirmation bias of creators) the complexity blackhole you will open by trying to formulate the maximal degree-of-freedom semantics for a perfect low-level language that can be programmed at a high-level by consumers of the libraries. Thus you are not comparing apples-to-apples, but rather whimsically maggots-to-fairies.

Even if it all makes sense from an algebraic point-of-view, the result can still fail for the programmers in terms of explosion in verbosity of types and/or obtuse unreadable type inference cases, etc..

Your goals are very ambitious in the first iteration

I don't think so, because I already have the type system mostly worked out.

Don't let me discourage you from attempting whatever you are interested to, but what you can't see perfectly is how your design will interplay with the real world needs and preferences of human beings. This is why iteration of smaller advances from an existing popular language is “iterate often, fail early” as I paraphrase Linus Torvalds or the open source principle.

If you can succeed in making a leap all the way to a totally new paradigm and hit all the sweet spots well, then that is an amazing accomplishment.

you could already write simple functional programs

That is not useful as a production coding system. Not to diminish the accomplishment for what it is. In other words, it doesn't meet my needs and timeline, although I can't speak for others.

Adding the instrumentation for runtime profiling will make it slower, and never able to achieve the performance of 'C'.

I think I have made it clear that performance is not the highest priority (but scaling is a priority), although I don't want to ignore performance as Python does. Java is a good example of not being as performant as C, but being a much more important language in terms of comparing the number of serious projects entirely written in C or Java. Java is an immensely popular language. By your logic, GC should never be offered, because afair it requires double the RAM to lose only a 33% performance with GC.

My problem with this approach is that the default is auto-magic according to rules to complex for the programmer to understand. As such the fine tuning is based on an uncertain starting point that may even change between compiler versions. This leads to very unpredictable performance. One program I write might be fast, and another slow, for no reason that I as the programmer could work out. Also a previously fast implementation may become slow in a future compiler version. As the magic happens in the compiler, it will be very hard to debug this slowness, as the source code does not actually contain any information about why it is slow (as performance is due to the interaction between the source code and the hidden optimisation rules in the compiler). I am not saying this approach is impossible, I am saying it leads to uncertainty, and frustration of the programmer using the language. It's one of the (if not the main) reasons I don't program as much in Haskell as I used to.

You certainly trust your C compiler to do certain optimizations. You also have the option to not turn on the most aggressive GCC optimizations which are sometimes unstable and cause problems.

I proposed initially to only eliminate very straightforward boilerplate with abstractions which are deterministic and reliable. There is simply no reason to do those in a library and it wouldn't be as elegant. I said going beyond that would be open to future research. Heck maybe even you'd select your optimization compiler AST plugin as an directive in the source code, so you have complete control and the complexity is not in the compiler.

I also proposed retaining the primitives to code manually those cases that need the control. So you could always override the compiler in the source code when you feel you need to.

Really I don't see the problem.

The point is that I don't have to wait for the perfect language just to get some very trivial abstraction I need this month for my work with TypeScript. And besides, you couldn't do this trivial abstraction as elegantly in a library in any future. I bite off the low hanging fruit first, before I reach for more complicated scaffolding.

keean commented

I still think that a sequence of commands should be executed in order, or at least any IO operations should remain in the same order. I think other that that other things can be parallelised, but as this is an optimisation it does not need to be in the first version of the language.

So what can and cannot be optimised is not what we should be focusing on. We should be looking at what the semantics of the language are. For example what are the semantics of IO? If we are using channels for IO then it is the order of operations on channels that must be preserved. This means that in the AST channel operations must be kept in the same relative ordering, but we can perform any tree transformation on the AST as long as the ordering of IO operations is preserved.

So you think everything should be dependently ordered. So go back to the 1980s. I need to write modern programs. We have an asynchronous Internet now. Even I can write to more than one file (i.e. I/O) at the same time.

I think you don't understand that I don't want to clutter my source code with async / await (or its even more verbose generators equivalent) boilerplate and also boilerplate for parallelizing writes to independent files. And I don't want to hardcode Promises when in the future greenthreads might be more performant for instances of those which unwind all the way to JavaScript's event loop. So it is a win-win to abstract it.

keean commented

Not everything need be dependently ordered, but any sequence of operations needs to happen in the order I say. Otherwise we will get deadlocks and livelocks in the code.

Personally I would define a sequence as any commands on separate lines like:

let a = input()
print(a)

clearly the print comes after the input (even if they are on different channels, we may be reading a file and outputting to the network for example).

But then what about the evaluation order of function arguments, what does:

f(x++, a[x])

do? If we allow parallel execution then this function will output inconsistent results. In general we want function arguments to be evaluated left-to-right. Forking a thread (even a green one) would result in very poor performance anyway because the overhead of sending the message to the other thread, and waking it up are huge compared to just doing it sequentially.

having everything sequential also makes debugging easier (program flow proceeds line-by-line, statement-by-statement).

So my preference is for sequential by default, and parallel where I allow it. Of course pure functions cane be parallelised, and so can the pure parts of impure functions, providing the side-effects remain in the same order. So in the above example "a++" as a side-effect producing command would form a barrier that prevents any access to 'a' from the right of this command from being moved to the left and vice-versa. Of course the parallel sections have to be long enough to make the messaging overhead worth it as well.

Promises can be implemented on top of green-threads as well (in fact threads of some kind are necessary, its just that JavaScript does not necessarily expose this). The difference is green-threads have more than one thread in the master-event loop. In JavaScript a single thread handles all the events in the master event loop. This has the nice property of serialising state access so the JS programmer does not need to worry about multiple simultaneous update from different threads. With multiple threads you have to worry about simultaneous state update, but promises themselves work the same, whether the threads are green, or heavy threads.

Not everything need be dependently ordered, but any sequence of operations needs to happen in the order I say.

I have never disagreed with that premise.

But then what about the evaluation order of function arguments

Some (many or most?) languages make no guarantees about the order of evaluation of function arguments. I am glad you mentioned that because I was going to bring it up when I saw your strawman argument about line order historically dictates order of evaluation.

Perhaps an example will help you to see my point. If I manually encode:

let [a, b] = await_parallelized( f(reference), g(reference) )

That is equivalent to my proposal:

// The independence or that we don't care about the ordering of f and g is a declared invariant
let a = f(read_only_reference),
    b = g(read_only_reference)

Note the read-only criteria is specified by the argument types of f and g.

Now which one is easier to read and is more abstracted so the compiler is free to optimize it with Promises or greenthreads and run it sequentially or in parallel.

And the problem with the former (your preference) is that the invariants are not stated, so it might have a human error bug and shouldn't be parallelized. By being explicit with invariants, the compiler is checking for human error and insuring the program is correct, i.e. compile-time/static checks.

So in the above example "a++" as a side-effect producing command would form a barrier

Obviously the compiler is going to detect when the ordering of execution is dependent and not parallelize it. Those are invariants. And it will catch cases that a human will miss.

Promises can be implemented on top of green-threads as well

But that won't remove the manual boilerplate gobbledygook of generators (or async keywords) all the way up the call stack that is necessary to integrate Promises with JavaScript's event loop.

So this is the reason why you don't want an overly complex compiler.

The word ‘overly’ can be very subjective. Why even have a compiler then. Why have higher-level languages. Let's code in binary machine code, which is the maximal degree-of-freedom semantics.

You probably have grossly underestimated (via the confirmation bias of creators) the complexity blackhole you will open by trying to formulate the maximal degree-of-freedom semantics for a perfect low-level language that can be programmed at a high-level by consumers of the libraries. Thus you are not comparing apples-to-apples, but rather whimsically maggots-to-fairies.

Even if it all makes sense from an algebraic point-of-view, the result can still fail for the programmers in terms of explosion in verbosity of types and/or obtuse unreadable type inference cases, etc..

http://www.cs.ox.ac.uk/ralf.hinze/WG2.8/31/slides/martin.pdf#page=26

http://tratt.net/laurie/research/pubs/html/tratt__dynamically_typed_languages/#x1-140003.1

http://tratt.net/laurie/blog/entries/another_non_argument_in_type_systems.html

http://theorangeduck.com/page/defence-unitype

https://www.quora.com/What-are-the-arguments-against-Robert-Harpers-view-of-untyped-languages-as-unityped-languages

https://www.cambridge.org/core/books/practical-foundations-for-programming-languages/dynamic-typing/D5BCAF33F715F883B835381C2EC43D20

keean commented
let a = f(read_only_reference),
    b = g(read_only_reference)

So this is a problem if another thread is modifying the referenced data. On one day with one compiler you might get a deadlock, and on another day with a different compiler you may not get a deadlock. An intermittent compiler dependent synchronisation bug really is the nastiest kind of bug to find. Most people give up and live with it. It's like an annoying rattle in a car that disappears every time you take it to the garage to look at it.

I have no problem with your semantics for pure code, after all the order of beta-reduction is non-deterministic.

I think side effect order needs to be preserved, for the sanity of the programmer. I can just imagine all the complaints when a new compiler version breaks lots of production code because it changes the optimisations and hence the evaluation order.

You say people can add annotations to force a sequential order, and I say they will forget. Just like with semi-colons in 'C' code, people always forget them, but in this case the compiler will catch the mistake. With ordering semantics the compiler cannot catch the mistake. It would be like JavaScript, where semi-colons are optional, but if you forgot one the program would still work, but might break on a different CPU or different compiler. This would seem to increase the chance of a bug getting into production code that nobody knows about.

Regarding syntax, I would suggest different operators. So if ',' is the sequence operator, '|' could be the parallel operator, allowing:

let a = f(read_only_reference)
   | b = g(read_only_reference)

So this is a problem if another thread is modifying the referenced data.

Irrelevant. You continue to construct disingenuous strawman which entirely side-step the topic of discussion.

Surely you see that if the block of code for the comparative examples doesn't have an exclusive lock on the referenced data, then either example's encoding of the parallelism can have ordering randomnessindeterminism. And more saliently, even a sequential encoding can have ordering randomness w.r.t. to changes made in an external thread unbeknownst to the block of code at hand. Thus you should have been able to deduce that invariant was totally irrelevant to the topic we are discussing.

(Tangentially you know I am providing these examples in the context of transpiling to JavaScript, so there is only one thread that can have write access any data if we pass only read-only references to blocking functions which might run in another thread, which is what I showed in my example)

An intermittent compiler dependent synchronisation bug really is the nastiest kind of bug to find.

Yet surely you can see that any such bug isn't "compiler dependent synchronisation" is it.

keean commented

If you had said:

let a = f(exclusive_read_only_reference),
    b = g(exclusive_read_only_reference)

I would have agreed with you. But the references in your example do not appear to be exclusive, and therefore a write reference to the same data could exist in another thread, but perhaps the solution to that is to not allow any data sharing between threads, although that leads to a 'C' like scenario where shared data needs to be copied on fork (or made copy on write).

I think side effect order needs to be preserved, for the sanity of the programmer. I can just imagine all the complaints when a new compiler version breaks lots of production code because it changes the optimisations and hence the evaluation order.

You continue to write disingenuous strawmen. How many times do I have to repeat myself that the compiler will not parallelize order dependent code paths?

But the references in your example do not appear to be exclusive

You still haven't understood the irrelevance of your remarks. It makes no difference whatsoever to the determinism whether we encode the execution as sequential, manually parallel, or compiler generated parallel, if the nondeterminism is external to the ordering invariants of that code block.

Global variables and multi-threading is chaos.

With ordering semantics the compiler cannot catch the mistake.

I am repeating myself again. If the program isn't checking for an error before doing the next sequential action that depends on the successful completion of the prior action, that is a bug.

The problem of your previously acknowledged concern arises because of using exceptions to generate errors, so the programmer can make an assumption that if the execution reaches the next action that appears successively in the source code, then the first completed without exception. Implicit parallelism would break this implicit assumption, only in the case of exceptions but not when all errors are return values. So the simple solution I already alluded to, is that the compiler shouldn't parallelize code within a try block or after any call to a function that can throw an exception. To get parallelized code, the programmer would save the exception as I alluded to, then rethrow it after the parallellizable block of code.

keean commented

Nearly every function can throw an exception (out-of-memory, divide-by-zero etc). I think where we differ is that you think there are lots of opportunities for this kind of optimisation in the average program, and I think there are not, because nearly every function falls into one of the classes that cannot be parallelised, unless it has been explicitly written to be parallel in the first place.

For the average program there is little performance to be gained here...

You should probably read this:

http://simonmar.github.io/bib/papers/multicore-ghc.pdf

With ordering semantics the compiler cannot catch the mistake.

I am repeating myself again. If the program isn't checking for an error before doing the next sequential action that depends on the successful completion of the prior action, that is a bug.

The problem of your previously acknowledged concern arises because of using exceptions to generate errors, so the programmer can make an assumption that if the execution reaches the next action that appears successively in the source code, then the first completed without exception. Implicit parallelism would break this implicit assumption, only in the case of exceptions but not when all errors are return values. So the simple solution I already alluded to, is that the compiler shouldn't parallelize code within a try block or after any call to a function that can throw an exception. To get parallelized code, the programmer would save the exception as I alluded to, then rethrow it after the parallellizable block of code.

Yet I don't think that is necessary. I can't think of any cases where your concern can occur. The example you provided upthread was if the program displays a message to remove the USB dongle before a file system action is completed. But the return from a file system function call doesn't guarantee that the file system has flushed its buffers. The program is going to need to call a function which checks the status of the volume and the return value will need to be checked before displaying that message.

keean commented

Quote from the paper:

Completely implicit parallelism is still a distant goal; one recent
attempt at this in the context of Haskell can be found in Harris
and Singh (2007)

So it appears to be an open, and unsolved problem... not the kind of thing to take on lightly, clearly academics have put decades into this with only "attempts" so far (which suggests no success)?

keean commented

But the return from a file system function call doesn't guarantee that the file system has flushed its buffers. The program is going to need to call a function which checks the status of the volume and the return value will need to be checked before displaying that message.

I think you make a good point there, but the type system would need to stop you ignoring return values, as they become important for sequencing. It would require complete control over the functions in the language, and foreign function imports would be difficult.

I do think any solution to this is going to involve the syntax of the language, and I could see the following working:

let _ = print(x) in let _ = print(y) in let _ = print(z) in ()

as a way to make sure the print statements always execute in the same order, but it looks like a lot of boilerplate to me, as print must return something, so we can inspect that return value before the next print (but we ignore it because we don't care what it is). Things get worse if we want to inspect the value:

if (print(x) == success)
   if (print(y) == success) 
      if (print(z) == success)
         return ()
()

For the average program there is little performance to be gained here...

You should probably read this:

http://simonmar.github.io/bib/papers/multicore-ghc.pdf

I already stated performance is not the main motivation:

Parallelization is not just the difference between faster and slower execution, and often also the difference between responsive and unresponsive due to allowing independent tasks to respond independently.

In other words, what you are focused in that paper is not what I want to focus the implicit parallelism to accomplish. I already mentioned numerous times that my first goal is simply handling promisified code less verbosely (less boilerplate) and more abstracted.

keean commented

In other words, what you are focused in that paper is not what I want to focus the implicit parallelism to accomplish. I already mentioned numerous times that my first goal is simply handling promisified code less verbosely (less boilerplate).

A promise chain is just a normal function sequence when you use co-routines. Note it is a sequence so you want one thing to follow the other, not in parallel.

So in a co-routine, the 'line terminator' is 'then' from a promise chain, a normal function or operator is a promise (or not, it doesn't matter if it blocks or returns immediately). This is the way promises are used most of the time, so it would seem best to give this the least syntax boilerplate. We then need way of running two functions in parallel like promise.all. I would suggest a special operator for this.

The program is going to need to call a function which checks the status of the volume and the return value will need to be checked before displaying that message.

Actually that is just transfers the same issue to needing to call the volume status function sequentially after the return of the file modification function(s). But then we'd need it return a Promise, because polling it in a loop would be inefficient.

But even that wouldn't be accurate, because the return from the file modification function doesn't even insure the buffers are dirty yet, as the modification could be queued.

Thus what is actually needed is the file modification functions must input a Future or return a Promise to signal when all buffers are flushed. The function must be told to perform this extra check.

Asynchrony forces us to never again trust that the return from a function means anything at all. It is time to kill that wrong-headed archaic implicit assumption. The function must callback (via a Promise or Future) to indicate the timing of a specific state requested by the caller.

keean commented

Regarding performance, asynchronous IO and multiple green threads will give you the responsiveness you want. Implicit parallelisation will not achieve this. For example:

let a = f(exclusive_read_only_reference),
    b = g(exclusive_read_only_reference)

parallelising this might halve the execution time, but if the UI is blocked on this operation it will still be unresponsive, although for less time. This is not a good solution for responsiveness. The way to get a responsive application is to have all the UI controlled from a single thread, that hands off any work to background threads that run in parallel. This application architecture is very common, and nearly all Android, iOS, Windows and Mac applications follow this model. The key facility here is message passing so the UI thread can send a message to a background thread asking it to do the work, and then receive the result back as a separate message at a later time. With co-routines/co-functions you can avoid having to return to any event loop.

Edit: even JavaScript supports this now with web-workers.

keean commented

Asynchrony forces us to never again trust that the return from a function means anything at all. It is time to kill that wrong-headed archaic implicit assumption.

Maybe... but with co-routines/co-functions we can recover this property, so code looks like simple linear code, and all the asynchronicity is hidden. An asynchronous function looks like a standard blocking function.

I do think any solution to this is going to involve the syntax of the language, and I could see the following working:

let _ = print(x) in let _ = print(y) in let _ = print(z) in ()

as a way to make sure the print statements always execute in the same order

That isn't necessary, because print takes the implicit reference to stdout, thus they can't be parallelized because they share a dependency on the state of the same stream.

keean commented

You would not want to parallelise them anyway because of the threading overhead. You really need a block of 100+ machine-code instructions, or a loop that iterates a few hundred times before it is even worth thinking about parallelising.

Edit: although if print were a complicated function, then maybe...

but the type system would need to stop you ignoring return values, as they become important for sequencing

I can't think of an example to support your claim, can you? The cases that I can think of that shouldn't be parallelized, wouldn't be because there are dependencies that would disallow it any way.

keean commented

I think tracking stream identity is not as easy as you think.

let instr = open(filename1)
let outstr = open(filename2)

Are instr and outstr the same stream?

Edit: The only safe thing to do (because the system is incomplete) is to assume all streams are aliases, unless we can specifically prove they are not aliases.

Another example:

f(x, y) =>
   x.print("1")
   y.print("2")

Here it depends on the arguments the function is called with whether it is parallelisable, and if this is a library function, we have to compile it with no knowledge of what arguments will be passed until runtime.

Regarding performance, asynchronous IO and multiple green threads will give you the responsiveness you want. Implicit parallelisation will not achieve this. For example:

let a = f(exclusive_read_only_reference),
    b = g(exclusive_read_only_reference)

parallelising this might halve the execution time, but if the UI is blocked on this operation it will still be unresponsive, although for less time. This is not a good solution for responsiveness. The way to get a responsive application is to have all the UI controlled from a single thread, that hands off any work to background threads that run in parallel. This application architecture is very common, and nearly all Android, iOS, Windows and Mac applications follow this model. The key facility here is message passing so the UI thread can send a message to a background thread asking it to do the work, and then receive the result back as a separate message at a later time. With co-routines/co-functions you can avoid having to return to any event loop.

Of course I know about that paradigm. Anyone (including myself) who has programmed Android knows it.

Any implicit parallelization can run f and g in other threads if it so deems it should (assuming that code block of my example was in the UI thread which I don't know why you assumed that).

Edit:

Edit: even JavaScript supports this now with web-workers.

Yup I had actually thought of that days ago. I might have mentioned it upthread.

Edit: The only safe thing to do (because the system is incomplete) is to assume all streams are aliases, unless we can specifically prove they are not aliases.

We already discussed upthread that interaction with the universe is unbounded nondeterminism, so no implicit parallelism exists. We would have to somehow prove that interactions with external resources are independent in order to declare that parallelism exists, or declare that we don't care about the ordering of the interaction of those resources. IMO, this is repeating what I already wrote upthread. I don't mind repeating it this one more time. Hope this is the last. You keep returning to that same point about external unbounded nondeterminism.

keean commented

Any implicit parallelization can run f and g in other threads if it so deems it should (assuming that code block of my example was in the UI thread which I don't know why you assumed that).

Co-routines/co-functions are idea for this because I can write:

my_event_handler(event) =>
    let x = do_big_asnyc_process(event.target)
    alert(x)

Here the event handler hands off the processing to a background task (a co-function) and then blocks, but this does not block the thread which returns to the thread pool to run other UI or background processes whist we are waiting. When the async process completes, the return message resumes this co-function from where it left off, and pops up the alert.

Co-routines/co-functions are idea for this because I can write:

Afaics, how we implement it behind the scenes is not important. What is important is what you just wrote is how I want the source code to appear without any explicit parallelism or asynchrony. The compiler decides which blocking code can be run simultaneously based on the dependencies. I really don't know why you keep pushing co-functions.

keean commented

The interesting thing about a promise is that it separates starting an asynchronous action from waiting from completion (regular JS):

let x = new Promise((succ, fail) => ...) // this starts the action
x.then((y) => ...) // this waits for completion

You can see normal function calls lack the semantics for this, they only have one execution point.

f = (x) => x // declare the function
f(1) // execute it (and wait for result, because we have no other sematics to wait for completion)

Of course we can do this with functions/modules:

f = (x) => 
    start_big_async_task
    return wait_for_big_async_task
g = f(1) // starts task
g() // waits for task.
keean commented

I really don't know why you keep pushing co-functions.

Because you without co-functions there is no way for the thread to carry on processing other stuff whilst we wait for completion. Think about why there are callbacks in JavaScript, and why Promises exist (to make callbacks easier to program with). JavaScript needs callbacks because it does not have co-routines or co-functions.

The interesting thing about a promise is that it separates starting an asynchronous action from waiting from completion

All I care is that I get the result and do something with it. The only reason to care about that separation-of-concerns is when I need to do something else between (i.e. in parallel with) the start and result. But my entire thesis herein has been that given the high-level invariants, the compiler can decide what gets parallelized and thus I do not need to care. Thus I don't need to care if Promises or greenthreads or coroutines are what ever are employed by the compiler.

For cases of parallelism that the compiler can't implicitly deduce or where I want tighter control over performance, I could manually encode it.

I really don't know why you keep pushing co-functions.

Because without co-functions

Afaics, that is not addressing my point. See above.

keean commented

All I care about is the semantics. If I have a sequence of instructions, I want to be sure that the compiler will never change the observable order of effects from that code.

The simplest implementation for this is to make everything simply sequential and parallelise nothing. This would be my suggestion to start with, as you want to get to a working testable compiler as fast as possible.

You are now free to add whatever optimisation passes you like, providing the observable order of effects is not changed. Any optimisation (including parallelisation) that does change the observed order of effects is broken.

@shelby3 wrote:

So this is a problem if another thread is modifying the referenced data.

...

(Tangentially you know I am providing these examples in the context of transpiling to JavaScript, so there is only one thread that can have write access any data if we pass only read-only references to blocking functions which might run in another thread, which is what I showed in my example)

I had actually been pondering this issue days ago...

Multi-threaded access to shared data is a “can of worms” chaos. But afaics JavaScript doesn’t actually avoid this, because even if we only pass read-only references or copies (aka messages) to blocking functions (which run in other threads), our single-threaded code gets run in chaotic order by the Promise callbacks and thus afaics is in effect multi-threaded!

This is why Rust claims we need a total order on borrowing in order to write correct programs in the era of significant asynchrony.

Are there any other paradigms? This seems quite depressing because Rust's total order on borrowing is a major pita and infects the type system in what appears to be deleterious ways.

You had suggested copying (pass-by-value) instead of pass-by-reference? Seems that is the only way out of the quagmire. You even argued that copying was more performant in many cases due to afair cache locality.

Seems we need something like a borrowing checker but with an emphasis on copying when we need to convert from an writable reference to an transitive never-written, read-only reference. So then we avoid the tsuris of Rust's attempt to track borrowing in a total order. We form partial orders by copying to a transitive never-written, read-only reference. In other words, we need immutability for correct parallelism. Mutability should be reserved for high performance code that is never parallelized.

keean commented

The simplest solution is no shared data, you send everything you want via message passing, and 'C' would have to copy shared state. So how can we get rid of shared state in the stack?

I think threads and state are bound together (and this is what smalltalk did), we only allow state to be private on objects, and each object runs in its own 'green' thread. So if each object has private state, and there is at most one thread per object, then inside the object we can have complete access to that state without worrying. We also solve aliasing, because the state is always a private member, if you pass it as an argument to a function you would be passing a copy of it.

The simplest solution is no shared data [between threads], you send everything you want via message passing

That doesn't solve the problem. You apparently haven't fully understood my post and how the timing of the arrival of messages will in effect multi-thread the callbacks waiting on messages even if isolated in their own thread. The only way to solve it is to have no shared state between code paths of asynchronous callbacks (even when they are in the same thread!).

keean commented

This relates to my other point about having state private in an object, and where method calls to any instance of an object are serialized.

Co-functions ensure the asyncronous result is returned to the same environment that issued it (we don't actually care if it is the same thread, as long as only one thread is running in any object).

Edit:

The only way to solve it is to have no shared state between code paths of asynchronous callbacks.

This is the same as I was saying:

The simplest solution is no shared data

No shared data implies no shared state between threads. If it is the same thread (as in JavaScript) you don't have a problem. So the implication of my statement would be that the async callback has to run in the same thread as the state prior to the call, which this is the callback for. As JavaScript has only one thread (ignoring web-workers) it complies.

A further clarification is that it does not really matter which actual thread runs the code, as we would likely have a thready pool, and the async callback would just use the current thread it was on, or pull a new thread from the pool, what is important is what stack and state that thread has, and that information is contained in the continuation we save when making an async call. So when resuming a suspended computation we can pull any thread from the pool, load the stack and data-segment pointers and were done.