zesterer/tao

Concatenative languages - Quark

dumblob opened this issue ยท 8 comments

I'm reading your personal web site and seeing all your endeavours with languages and I like your attitude you expressed e.g. in Why didn't you use an existing game engine?.

Have you also already explored concatenative languages in general (your atto seems also concatenative)? If not, I'd suggest taking a look at Quark (there is also an original Ruby implementation in few tens of lines).

And sorry for being off topic - I didn't want to spam you privately ๐Ÿ˜‰.

I've briefly explored concatenative languages, yes. I find the concept very intriguing. I've briefly played with integrating some concatenative capabilities into Tao, although I believe that currying and function composition net a considerable number of the benefits that concatenative features bring. If you have ideas about this, I'd love to hear them! As it happens, I'm currently working on a functional language backend for a new rework of Tao's compiler.

I've read through the intro page on Quark and it sounds very interesting. Perhaps I'll play around with it tonight. I like the similarities it has with LISP. I know that on the esolang Discord languages like this have been discussed a lot. I actually came up with a stack-driven esolang in a similar vein: https://esolangs.org/wiki/Hanoifuck.

As an aside, you might be interested in my new-compiler branch of Atto. It's a complete overhaul of the language with support for first-class functions and variable arity (example: https://github.com/zesterer/atto/blob/new-compiler/atto/quicksort.at). In keeping with the original, higher-level concepts like list literals are actually part of the language's prelude, not built-in syntax (https://github.com/zesterer/atto/blob/new-compiler/atto/core.at#L93-L99). It probably comes a lot closer to the original in being a concatenative language.

I've briefly played with integrating some concatenative capabilities into Tao, although I believe that currying and function composition net a considerable number of the benefits that concatenative features bring. If you have ideas about this, I'd love to hear them!

Yes, currying (and in general function composition) is very powerful and allows for quite clear semantics when reading the code and is a firm base for (automated) correctness checking.

This will be very subjective, but I fell in love with the concatenative concept (not so much with most concatenative languages I've tried so far ๐Ÿ˜ข). It's for the sole reason of me finding it truly "unconstrained" with regards to composition. After writing something bigger in functional style (at best in a functional language), the more SLOC I wrote the more it felt tedious to manage all the pure surgically separated and utterly long "serialized branches" (code paths) where making a change requires mentally "synchronizing" these branches at deeply buried places which in practise feels like jumping randomly in a homogeneous infinite number of symbols on a tape (as it kind of lacks enough intuitively & quickly mentally recognizable points of interest leading to partial quasi-solutions like minimap getting widespread).

I made an observation that after trying to write something in concatenative langs I felt like I don't want to go back to the verbose and tedious world of all other languages (kudos to Red/Spry/Rebol and partially V as the only exceptions in this regard - but they have their own difficulties...).

For me functional style solves the "coding mess" by introducing overly narrow play field always leading to correct, but tedious and verbose stuff. Concatenative with pattern matching (or similar "cherry picking" mechanism) like in Quark feels much more fluent (and as a side effect is less verbose).

So I'm not sure how could Tao be adjusted to target the verbosity and "surgical linearity" coming up from being functional. But let's contemplate a bit.

I for myself have noticed there are three things greatly affecting general usability of any language (did I already tell you I'm into languages because of my obsession for practicality and pragmatism?):

  1. Ahead-of-execution self-correctness checking ability. Boils down to how much of general computing ability the language offers to the user at "compile time" - at best seamlessly interleaved & interoperating ("first class part of the language itself") with "runtime" code. I.e. not just in the form of a static analyzer/sanitizer/static_type_checker/parallelism_collisions_checker/... like covscan/asan/shellcheck/mypy/... (this is the reason I proposed "compile time asserts for Quark in henrystanley/Quark#2 ).

    This includes also the typing system in the language. If not without fully automated inference and if not extremely fine-grained (from a fully dynamic "any" type up to structural & value range & value black/white list & bit match & arbitrary condition...), then it's guaranteed to be a problem in the future. Something like combination of [gradual(https://en.wikipedia.org/wiki/Gradual_typing )/latent/refinement/substructural/... typing seems like a step in the right direction.

  2. Withdrawing the computed results from the current operation (or from a coupled set of operations) - often drastically reduced to just returning from a function - shall support arbitrary number of arbitrary values and make the withdrawn results as easy to use as possible (and allow to use them differently at each place of use). For functions, which themselves are highly constrained in this regard, there is a poor alternative - namely returning multiple values (or "boxing" them into a temporary struct and then using the given fields of the struct separately for different purposes). For functions the most "advanced" use of this limited functionality seems the Option value in Rust or even better Optionals in V.

    This principle is actually deeply rooted in the tangible world - each time you get a package from a logistics company, it's different despite containing same or similar stuff. Each time you want to work with this obtained package differently (at least a bit).

    Concatenative languages do not have any such limitation (though in practice many unfortunately try to constrain combinators to behavior resembling functions ๐Ÿ˜‰).

  3. Adjustability on-the-go. I.e. the ability to easily (it won't make you angry), safely (as opposite to error-prone) and quickly (minutes at most) overcome any language limitations in any (incl. highly complex) scenarios (imagine "quasi-safe" pointer hacks in C or hacking the compiler itself in case of V or (re)defining combinators in concatenative langs).

How does Tao approach these 3 dimensions?

How does Tao support returning multiple values? At least an arbitrary order of them (or even some simple pattern matching) specified at the place of use (not the place of function definition as that would serve only as the "default" order) would be useful for practical use.

Is Tao's typing concept easily extensible and arbitrarily fine-grained (incl. fully dynamic)?

I actually came up with a stack-driven esolang in a similar vein: https://esolangs.org/wiki/Hanoifuck.

Pretty cool! Stacks in general are a very powerful concept if used carefully. How did you get the idea?

Perhaps I'll play around with it tonight.

Curious what you found out ๐Ÿ˜‰. Any observations & thoughts?

It probably comes a lot closer to the original in being a concatenative language.

Yes. I actually took a look at Atto before I started this thread and noticed a huge PR zesterer/atto#4 which seems to be exactly what you linked above.

I find Atto a nearly-concatenative language with pretty typical structure and properties ๐Ÿ˜‰. The first thing I noticed is there are quite many "helper symbols/keywords" which could be omitted to get a more lightweight feeling. Second thing is it feels pretty strict with the prefix notation as well as with no higher level way to shuffle data around (yep, this "shuffling data around" implemented through pattern matching is what makes Quark IMHO so superior among concatenative languages despite the pattern matching being still rather basic).

Any plans with Atto?

How does Tao approach these 3 dimensions?

Let me try to address these in turn:

Ahead-of-execution self-correctness checking ability.

Tao has static type-checking and hindley-milner type inference. In practical terms, this means that the type of any value must be known ahead of time and gets checked against other elements of the program. Type inference means that it's rarely necessary to manually annotate a value with a type hint because the compiler is allows to figure it out based on context.

For example:

# The type of `five` is inferred to be `Num`
let five = 5 in
# The type of `x` is inferred to be `Num` from the addition, and so `add_five` is inferred to be `Num -> Num`
let add_five = |x| x + five in
# The type of `add_ten` is inferred to be `Num -> Num` also as a result of the product of each call to `add_five` inside it.
let add_ten = |x| x:add_five:add_five

(In the example above, arg:f is an infix function call. x:add_five:add_five is therefore equivalent to add_five(add_five(x)) in C)

As you can see, despite never needing to explicitly annotate types, the compiler is still able to infer the type of every expression based on the information it has available to it.

Type-checking also occurs at the same time as type inference and ensures that every Tao program is well-formed and that the types of all values are appropriately compatible with one-another.

In the near future I hope to expand the language with subtyping.

Withdrawing the computed results from the current operation (or from a coupled set of operations) - often drastically reduced to just returning from a function - shall support arbitrary number of arbitrary values and make the withdrawn results as easy to use as possible

Tao functions represent strict mappings between two types such as Num -> Bool. Functions always take exactly one value and return exactly one value. Multiple-parameter functions can be achieved through currying, as is traditional in the functional language world.

Tao does have support for tuples and lists though, so multi-parameter, multi-return functions can be easily achieved. For example:

# `sum_and_product` is inferred to have type `Num -> Num -> (Num, Num)`
def sum_and_product = |x, y| (x + y, x * y)

(The |x, y| <body> notation is just syntax sugar for 2 nested single-parameter functions like |x| |y| <body>)

I can't currently envisage a way in which this might be expanded to fully encompass the behaviours that concatenative languages permit, although I'm open to suggestions. Currently, it's necessary to deconstruct the results of multi-return functions using a let pattern:

let (sum, product) = sum_and_product(3, 4) in
...

Adjustability on-the-go. I.e. the ability to easily (it won't make you angry), safely (as opposite to error-prone) and quickly (minutes at most) overcome any language limitations

Tao is designed to provide predictable, deterministic, safe semantics that are easy to reason about. For this reason it doesn't support the sort of low-level pointer-juggling operations you speak of (not least because the future optimising backend I'm working on for it would make reasoning about program structure fairly difficult).

I'm not inherently against things like AST-level macros but I generally consider them a hack that makes it unnecessarily difficult to reason about program semantics. If Tao does support things like meta-programming, it will do so in a way that takes into account the type system and makes it trivial to understand the behaviour of a piece of code.

Is Tao's typing concept easily extensible and arbitrarily fine-grained (incl. fully dynamic)?

Extensible in the sense that Tao supports type parameters, custom data types (including recursive ones), and with plans to expand this out into a trait/typeclass system to permit interface-driven programming. I don't plan to add support for dynamic typing though (see this blog post for some conclusions that I also reach when using dynamic type systems).

You can emulate 'dynamic' typing with lists/maps and enums though. Encoding something like arbitrary JSON data in Tao's type system would be trivial to do.

Pretty cool! Stacks in general are a very powerful concept if used carefully. How did you get the idea?

I enjoy stack-driven languages and I'm also the creator of an optimising Brainfuck compiler. It occurred to me that marrying both together should be perfectly possible.

Curious what you found out wink. Any observations & thoughts?

I enjoyed the simplicitly of the pattern-matching. It feels like a more rugged language than other concatenative languages like Forth, although unfortunately I'm not personally familiar enough with concatenative languages to properly appreciate the syntax.

The first thing I noticed is there are quite many "helper symbols/keywords" which could be omitted to get a more lightweight feeling.

Interestingly, Atto has almost no inherent keywords. Even basic operators like +,*,-,/ are actually defined in the core library. Heck, even simple syntax like list literal syntax ([1, 2, 3, 4]) is defined in core!

Second thing is it feels pretty strict with the prefix notation as well as with no higher level way to shuffle data around

Atto is first and foremost an esoteric language and as such it was designed to be incredibly simple to implement (hence the self-hosting interpreter). As a result, I've not really made any attempt to make it more expressive than what is minimally required to reach that goal.

Any plans with Atto?

The new-compiler branch is an attempt to move beyond the aforementioned limitations, but I've largely lost interest in Atto as a project. I wouldn't be opposed to reviewing and accepting pull requests or discussing it further though, should you be interested.

MOTIVATIONAL INTERMEZZO

What I mean with practical is only one thing: a programmer just wants to get things done - i.e. as fast as possible and with as low friction as possible. First and foremost she needs to completely forget about formal correctness checking for a while and mimic exclusively the risk as is in real world (without taking any future into consideration because that's the job of the programming language and technology - to make arbitrary future changes super easy - unlike we were all told years ago at schools to "think about future"). Majority of risks in reality are being accepted with no action, some lowered (e.g. by paying insurance) and only a marginal amount mitigated.

E.g. a manager tells her "JS APIs won't cause any big financial losses in our business if they won't work properly". If a reviewer (and later the manager) finds out, that she wrote some boilerplate code on top of the needed minimum (which took her considerable amount of time) and that she was the person who advocated for the technology (e.g. the programming language she used), she'll be considered a total greenhorn junior dev unable to cope with even the most trivial stuff from reality.

END OF RANT ๐Ÿ˜‰

TLDR; I'm looking for a language (maybe a concatenative one or maybe not) with very fast compilation times (seconds at most), readable short syntax (imagine V but not necessarily C-like), dependent types-like checks support instead of types (imagine Dependent Types and Certificates for parallel correctness but with unrestricted form of dependent types, less verbose, and more modern with higher-level concepts), with pattern matching syntax instead of variables (imagine pattern matching from Quark but more advanced to accommodate for full support of dependent types), and conforming to the 3 major "acceptance criteria" I outlined in my comment above.

Some "random" observations:

  1. Defaults in languages shall rely on and closely model the state of the art capabilities of current HW (incl. virtual HW). It's easy to relase new set of defaults (default types, default structs, default composites, default constants, default stdlib APIs, default combinatorss, default functions, etc.) every year and pass it as argument to the compiler ideally separately for each designated block of code (for simplicity a designated block of code could be a text file). This way backwards compatibility can be maintained while allowing new modules to be written with new defaults and thus use the state of the art HW capabilities.

    Most languages (IMHO for no reason) abstract even over many basic concepts destroying performance and unnecessarily complicating everything by such layer of "indirect overlapping mapping". I'd envision a more "bare" form (though not exactly assembly as assembly doesn't provide compile-time safety through dependent types nor the minimal safety higher-level concatenative or pseudo code languages provide).

  2. I think nearly any language (especially untyped or symbolic ones) can be enhanced with the following constructs:

    1. assert <expr> (evaluated in runtime - i.e. the most generic way to check for types and values in arbitrary fashion)

    2. sassert <expr> (assert evaluated in compile time; shorthand for static assert)

      Note, the compiler checker should try to pin point as accurately as possible which parts are not satisfiable. The checker shall be also instrumentable to allow being told give me another hint as the programmer thinks this particular part shall stay as is (this is more about heuristics and maybe AI-based suggestions).

    3. claim (evaluated in compile time; adds a fact to the stack of facts; newer facts shadow older facts in case of interference; this could be later extended to push it e.g. to the global - i.e. the very first - lexical scope instead of to the local one)

      This construct effectively forces the compiler to believe anything (whether it's about arbitrary values or arbitrary types or both) in the given lexical block (be it explicitly delimited or not or partially both) and its lexical children.

      Note that a runtime claim would be more or less equivalent to plain value assignment. Nice symmetry, right?

    The workflow would be to write sasserts and if the compiler would scream, try to fix it by adding few claims right in front of sassert and if it won't help, simply resign and split sassert into sassert and assert.

    The language could also allow to label or somehow reference parts of already written sassert/assert/claim expressions in other sassert/assert/claim expressions. Again explicitly or implicitly (or combination thereof) remains to be decided. This should significantly lower boilerplate (remember, to keep up with time, every year or so there could be new default prearranged set of those as mentioned above) and encourage intuitive composition.

    A nice property of this approach is that it's highly contained (unlike scattered like in most languages nowadays). It doesn't disrupt the flow (it can be edited quickly on one place) yet it's short, terse, readable, simple and fully generic.

    Btw. this allows for nice development of the language itself if the compiler would distinguish between "development compilation" (the default) and "production compilation" (where some warnings from "development compilation" would turn hard errors or vice versa be silenced if it was just a suggestion). Because claims and asserts are easily identifiable, the compiler, after getting better at sassert jobs, could automatically warn the programmer (only in "development compilation") she should switch to sassert. Moreover, the first release of the language could simply change all sasserts to asserts (and report this in non-production compilation) which would guarantee application correctness (not so much the operation though ๐Ÿ˜‰) with no effort.

  3. Approach from (2) would allow to make the language arbitrary-looking, truly minimal and non-disruptive yet readable and easily understandable even to newcomers with let's say only imperative background. Such language would be also much closer to symbolic computation than mainstream languages nowadays yet as performant as highly optimized assembly - all this thanks to defaults (remember, just one argument to the compiler).

  4. Atto and Quark seem like interesting building blocks for a potential language reaching the wishes from TLDR.

  5. I've spent my whole life so far in non-assembler languages, but after discovering concatenative languages (which require mental model much closer to bare metal & assembly while keeping everything even higher level than e.g. functional languages) I realized languages should not abstract over assembly principles, but rather provide them raw with additional safety checking (both compile- & run-time) and add high level functionality rather than mathematically (or otherwise) abstract over (i.e. wrap or slurp) everything and oppress the concepts behind modern assembly.

  6. I agree with the article you linked. It's actually something I'm not concerned with at all as I'm strongly convinced that static typing is fully equivalent (incl. expressiveness etc.) to dynamic typing when it comes to describing the real world around us.

    I can't deny though that the article left out a very (IMHO most) important fact. Namely that a language is not about "how to store the algorithm" (the time a programmer spends with "programming" on the final source is exactly 0 by definition), but about "how to make the endeavor of constant editing until a satisfying algorithm is found as short as possible while maintaining high-quality outcome" (i.e. the programmer's path to accommodate the algorithm to a changed JSON; not the final code handling the new JSON structure as described in the article).

    Sure, the final outcome is also important, but I'm one of those believing, that if the path is good, then the outcome is also good (I'm not saying perfect, but nor Haskell can guarantee that ๐Ÿ˜‰). Basically a good path just sets the lower bound.

    Static typing and dynamic typing represent only two extreme ends of one dimension (sometimes it's desirable to not care about types because the potential harm is too small to address it as described in the rant above and vice versa sometimes you want to care about types a lot but the API or worse the language itself doesn't support that - solving this requires to have excellent support for "Adjustibility on-the-go" as outlined in my previous comment).

    In practice this static-dynamic typing dimension is IMHO quite unimportant (proof: there are about as many huge industry applications in statically typed languages as in dynamically typed languages) and addresses only a small part of what the programmer wants to express. It seems more of a holy war than anything else.

    What's IMHO important is the workflow how does one work with languages and their ecosystems. From design (architecture etc.) through mockups, then prototypes, then MVP, then first release candidate, then first release, then deployment, then operations (incl. number and severity of issues, quick fixes, overhauls, etc.), then upgrade to new release, then at some point migration away from it, then staged shut down, then final cleaning, then archiving (late audit, investigation, etc.).

    Whoa, so much there and nobody speaks of it when designing a language ๐Ÿ˜ข. What I see is that static-dynamic typing is irrelevant here. But what matters is how much is such language successful in all the parts of the overall workflow. I could make a table here with parts as columns and some chosen well known languages (incl. Haskell, Scala, C, JS, V, Python, etc.). Most would presumably fail quite a lot in more than one of these columns. That's bad and that's what I'm aiming at with any new language ๐Ÿ˜‰.

  7. There is often the need to abstract over the very same data (value, instance) in the same place of use (lexically) in different ways - imagine overlapping types/APIs/ad-hoc_stuff/etc. That's why I'm not advocating upfront definition of "types", but rather assert/sassert/claim together with pattern matching right at the place of use of the data.

    E.g. temporary representation of probability outside of 0..1 to get over precission issues of the underlying IEEE754.

  8. Partial approaches (which just scratch the surface) trying to solve certain issues of current languages include occurence typing, type predicates, refinement types, behavioral subtyping etc. But none of them comes close enough to dependent types.

  9. Theorists would say that what I envision may be viewed as having two Prolog instances (one running at compile time and the other running in runtime) accompanying a simple system programming language, but all sharing the current values and computational progress.

I'll react to Tao-specific and Atto-specific points from your comment separately. But feel free to not wait for it and write what you think ๐Ÿ˜‰.

Hmmm. I understand and empathise with a lot of your points, but I'm not sure I agree with several of them.

without taking any future into consideration because that's the job of the programming language and technology

I don't really think this is true. Programmers often have a particular relationship with a programming language, and it's one in which they, the developer, are sentient and wish to progress forward. It is the job of the language to interpret their wishes in a formal and consistent manner while pointing out inconsistencies in their ideas.

This is the almost universal story of the progression of type systems: becoming ever more capable of interpreting abstract concepts (user-define data types, generics, typeclass constraints, higher-ranked types, etc.) in order to correctly ground human thought in sound, provable logic. They aren't there to push the user forwards: rather, they're there as a brake mechanism on the mistake-driven tendencies of the human mind.

Most languages (IMHO for no reason) abstract even over many basic concepts destroying performance and unnecessarily complicating everything by such layer of "indirect overlapping mapping". I'd envision a more "bare" form ...

This is a misnomer from the past.

In the 1980s one might drop out of C and jump into assembly in order to squeeze a few extra cycles out of the hardware. That hasn't been the case for about 20 years, however. Modern compilers are far more capable of making sensible choices about optimisation and representation than even the most well-versed human assembly programmers and are commonly capable of leaps of logic that are simply missed by programmers that are not working on a problem for many, many hours. As an example, take Rust's iterator API:

let answer = (0..1000)
    .filter(|x| x % 2 == 0)
    .map(|y| (0..x)
        .map(move |x| x * y))
    .flatten()
    .sum();

This simple section of code represents some rather complicated logic but represents it in a manner that is, in my view, as reasonably intuitive and simple as one could formally make it.

You might argue that it has terrible performance characteristics: indeed, we're creating closures, passing variables between scopes, chaining method calls, and doing lots of bizarre and slow things.

You'd be wrong. With optimisations enabled, the Rust compiler will flatten every aspect of this out into extremely efficient code, parallelised using SIMD, that is inevitably much more efficient than that which would be written by even the most experienced assembly programmers.

You might then change your argument a little: perhaps this is simply a product of the codegen backend? Surely writing out the logic as something 'close to assembly, but not quite' would allow the compiler to do similar.

This, too, would be wrong. Rust's iterator API actually statically allows the compiler to prove more invariants about the code by laying it out in such a way that makes the data dependencies clearer, thereby enabling more optimisations. If you were to write equivalent code using some good old for loops and if expressions, you're highly unlikely to be able to come up with anything that's as well optimised as this, simply because you're a human being and you're spending most of your time thinking about ordering, off-by-one errors, and the like.

There's another advantage here too: everything is type-checked. It's very difficult to write iterator code and get it wrong. It's much easier to get tangled loop logic wrong, however. This fact significantly reduces the mental overhead associated with writing code like this and frees up the programmer to concentrate on more useful elements: business logic and architecture.

All of this points towards a rather important thing that's more true today than it ever has been and is likely to become more true in the future: abstraction allows the compiler to generate faster code.

If you can define your code without reference to side-effects and curious but irrelevant artifacts like exactly how one constructs loops, it frees up the compiler to make much better decisions about these details when turning your code into an executable. Languages that permit this sort of abstract, declarative styles (i.e: functional languages) are starting to accelerate past tradition imperative languages in performance and this trend is likely to continue as compilers get better at applying high-level optimisations to code.

The workflow would be to write sasserts and if the compiler would scream, try to fix it by adding few claims right in front of sassert and if it won't help, simply resign and split sassert into sassert and assert.

I don't inherently disagree with this. Being able to prove compile-time invariants is extremely useful. That said, dependently-typed languages have a long way to go before this can be made generally useful. In the meantime I think constraining data ranges Ada-style and taking a 'parse, don't validate' approach to program construction is much more likely to bear fruit in the space of maintaining program invariants.

Rust, again, takes this to heart: Its String type verifies, as an invariant, that it always contains valid UTF-8 bytes. This allows both for the compiler to perform more optimisations upon the code under this assumption, and for the programmer to use String without fear of needing to deal with invariants being invalidated under their feet. String does this by verifying any potentially non-UTF-8 data upon insertion or creation (in practice, this doesn't need to be done very often).

Btw. this allows for nice development of the language itself if the compiler would distinguish between "development compilation" (the default) and "production compilation" (where some warnings from "development compilation" would turn hard errors or vice versa be silenced if it was just a suggestion).

This sounds a lot like you'd enjoy Zig. It does a lot of development checks that then get disabled in release builds. I don't personally sit well with this methodology, however: correctness is an important principle to me and I like languages that constrain their semantics, as much as possible, to sets of programs that are safe and correct.

I realized languages should not abstract over assembly principles, but rather provide them raw with additional safety checking (both compile- & run-time) and add high level functionality rather than mathematically (or otherwise) abstract over (i.e. wrap or slurp) everything and oppress the concepts behind modern assembly.

I think, as mentioned earlier, that you might be getting the wrong end of the stick from the article you linked. It's not making the case for a language based upon portable assembly: instead, it's making the case for alternative CPU designs that take advantage of languages that don't fit so cleanly on top of C's programming model and that making the assumption that 'all languages look like C' is a poor assumption.

Namely that a language is not about "how to store the algorithm" (the time a programmer spends with "programming" on the final source is exactly 0 by definition), but about "how to make the endeavor of constant editing until a satisfying algorithm is found as short as possible while maintaining high-quality outcome" (i.e. the programmer's path to accommodate the algorithm to a changed JSON; not the final code handling the new JSON structure as described in the article).

I'm not convinced about this one. Code is generally read (and executed) far more times than it is written. It's important that a language constrains the side-effects of a piece of code in a manner that makes following its logic easy for an external observer. Writing code is expensive in the abstract, but getting it wrong or maintaining it over a long period of time is arguably much more expensive. If we want to build languages that can stand the test of time, they must have semantics that prioritise the long-term stability and legibility of a codebase, not the short-term off-the-cuff thoughts of a tired programmer. Being able to optimise for both at the same time is, obviously, the most most desirable outcome though.

There is often the need to abstract over the very same data (value, instance) in the same place of use (lexically) in different ways - imagine overlapping types/APIs/ad-hoc_stuff/etc. That's why I'm not advocating upfront definition of "types", but rather assert/sassert/claim together with pattern matching right at the place of use of the data.

This is another way of saying 'validate, don't parse' and it turns out to be a very bad idea for the long-term health of a large codebase. It may suit the needs of a programming cranking out some 300-line demonstration, but it simply does not scale well.

Refinement types, as a point of interest, share similarities with the runtime assertions that you mention but deviate in one important (and incredibly useful) aspect that allows them to scale well across a codebase: once a refinement upon their representation is known, that refinement becomes associated with the value's type and hence no longer requires verifying later on in the program. This is crucially important. It means that if an earlier validation gets removed (or its scope widened) you get a type error at the point of use since the refinement is no longer valid. This massively reduces the mental burden of using them.

Very interesting observations! I'm glad you are raising them as I can then better convey my thoughts.

without taking any future into consideration because that's the job of the programming language and technology

I don't really think this is true. Programmers often have a particular relationship with a programming language, and it's one in which they, the developer, are sentient and wish to progress forward. It is the job of the language to interpret their wishes in a formal and consistent manner while pointing out inconsistencies in their ideas.

I agree. I meant before any technology (incl. language) gets even chosen for the task.

Most languages (IMHO for no reason) abstract even over many basic concepts destroying performance and unnecessarily complicating everything by such layer of "indirect overlapping mapping". I'd envision a more "bare" form ...

This is a misnomer from the past.
...

This is a huge misunderstanding and I'm sorry for not articulating what I mean clearly enough.

I'm missing really just abstractions over purely HW-related stuff, not arbitrary abstractions in code. E.g. I don't know of any simple (HW-focused) abstraction over transactional memory nor over all the CPU caches despite the models HW implements are really simple. But languages do not offer anything to my best knowledge (and no, intrinsics - at least those I know of - seem not trying to mimic the HW model, but are just syntactic sugar over pure ASM). Even SIMD you mentioned are offered in too simplified forms if any (no, compiler's automagical vectorization doesn't count here at all) and the HW SIMD model is only available in a horrible and totally unsafe form as assembly.

To prove I meant really just HW-related abstractions, feel free to read my more than a year old wrap up of parallelism incl. why I consider SIMD parallelization to be actually a solved issue (hint: exactly because there are nice abstractions - many inspired by functional programming as you wrote - e.g. the state of the art libraries https://github.com/mratsim/weave and https://github.com/zero-functional/zero-functional and https://github.com/mratsim/Arraymancer for Nim or https://github.com/rayon-rs/rayon for Rust).

One could also say I'm looking for HW abstractions which are higher-level and much safer than assembly but lower-level than e.g. Linux/BSD/Windows APIs offer to user applications. Something like exo kernels or typical type-1 hypervisors with paravirtualization API/ABI offer - but built-in in a language (or as part of minimal standard library in case the language has free form syntax).

In the meantime I think constraining data ranges Ada-style and taking a 'parse, don't validate' approach to program construction is much more likely to bear fruit in the space of maintaining program invariants.

Here I can't entirely agree, because I have strong experience, that the most severe bugs in terms of true losses & harm (not to be confused with frequent or common or expected bugs) are those inexpressible with Ada-style type constraints capabilities (though Ada is much closer to truly dependently typed language than any other I know of).

Of course, this doesn't imply, that expressing only those invariants causing losses would automagically eliminate the need for expressing Ada-style invariants. Albeit in practice I'd bet expressing invariants causing losses would guide one to express also other invariants thus covering both cases.

Notice also that I'm proposing claim exactly to overcome the limitation of deep analysis of the code in compile time (which is undecidable unless compile time is the runtime).

This sounds a lot like you'd enjoy Zig. It does a lot of development checks that then get disabled in release builds.

Actually many languages do that as it turns out very useful as development environment is always different from deployment environment and thus e.g. the setup is different, optimization needs are different, development HW is different, mocking is different, inputs are different, etc.

I'm not the biggest fan of Zig especially due to its enormous unnecessary complexity and verbosity (it implements nearly any and every lower-level abstraction and idea from the whole IT computing era), but that's subjective. I know my coworker likes it a lot and I'm supporting him wherewer I can to allow widespread use of Zig as I feel that's the language which should replace C (it won't, but anyway ๐Ÿ˜‰).

I don't personally sit well with this methodology,

That's interesting. Could you elaborate more why you "don't sit well with this methodology"?

however: correctness is an important principle to me and I like languages that constrain their semantics, as much as possible, to sets of programs that are safe and correct.

Yep, that's how I'd envision default behavior of a good language. On top of this I require the language to offer means to change the default behavior (that's the goal (3) from my previous comment above) - ideally with some sort of controlled shouting from the compiler (controlled e.g. by shouting especially in the "development" mode as discussed above and less shouting in "production" mode).

I think, as mentioned earlier, that you might be getting the wrong end of the stick from the article you linked. It's not making the case for a language based upon portable assembly: instead, it's making the case for alternative CPU designs that take advantage of languages that don't fit so cleanly on top of C's programming model and that making the assumption that 'all languages look like C' is a poor assumption.

Sure, it's not about making "portable assembly", but about using (and creating I'd say) languages which actually embrace what modern HW offers. Which in the context of the article (i.e. C ecosystem and use cases) is another way of saying what I called "more bare" above.

I'm not convinced about this one. Code is generally read (and executed) far more times than it is written. It's important that a language constrains the side-effects of a piece of code in a manner that makes following its logic easy for an external observer. Writing code is expensive in the abstract, but getting it wrong or maintaining it over a long period of time is arguably much more expensive. If we want to build languages that can stand the test of time, they must have semantics that prioritise the long-term stability and legibility of a codebase, not the short-term off-the-cuff thoughts of a tired programmer. Being able to optimise for both at the same time is, obviously, the most most desirable outcome though.

I actually thought you might bring up this "read more times than written" observation (which on its own is totally correct and valid) to justify focusing solely on the resulting code in programming language design (which is false). There is a clear reason why this is not at all applicable to practical programming workflow as outlined in my previous post above. Unfortunately language designers seem to completely dismiss this and fall into this "read more times than written" trap ๐Ÿ˜ข.

It's because editing the code is the only inherently serial work which one person has to do to capture the algorithm. Everything else depends strictly on this "editing stage" being finished (i.e. the result is e.g. compilable without errors). Any reading (reviews, audits, learning, refreshing someones mind, etc.) happens fully in parallel to other activities (like compilation, pre-production test runs etc.) - but still only after the "editing stage" got finished. Not speaking about the fact, that any reading takes an order of magnitude less time than editing.

So any argument that language shall make the resulting code a priority over the editing workflow is a plain fallacy and IMHO is one of the major reasons why IT industry is known to have the highest risk of any endeavor among all existing industries.

This is another way of saying 'validate, don't parse' and it turns out to be a very bad idea for the long-term health of a large codebase. It may suit the needs of a programming cranking out some 300-line demonstration, but it simply does not scale well.

Hmmmm, I actually think the approach I've described does exactly parse, don't validate (and not validate, don't parse). The semantics of combinators and pattern matching (even as implemented in Quark) is not to tap on the stripe of symbols (i.e. read some part of memory, leave the memory as is, act upon the read stuff and then store the result somewhere else), but exactly the opposite. Namely to pop items out of memory first and then act upon them and "return" the results in place of the recently popped items.

Which is exactly the concept "consume less structured, produce more structured" as described in the article. I called it ad-hoc looking, because it feels so, but it's not. IMHO it's even enforcing this parse, don't validate principle unlike actually Haskell or any other non-concatenative language I know.

I'd go even further and say, that validate, don't parse exists merely because lambda-calculus-based functional programming without enforcing linear type system everywhere (which seems to account pretty much for more than 99% of all functional languages) emerged ๐Ÿ˜‰.

Actually Clean as a functional language which does support enforcement of linear typing seems to be surprisingly close to what concatenative languages offer in practice (though with more boilerplate) - I'm mentioning this to support claims like Joy is in fact closer to FFP than any of the languages mentioned so far from Manfred von Thun's papers (see much more on http://www.kevinalbrecht.com/code/joy-mirror/index.html and e.g. https://github.com/Wodan58/joy1/tree/master/lib - it's very informative and for some kind of groundbreaking stuff - it at least opened my eyes some time ago).

Refinement types, as a point of interest, share similarities with the runtime assertions that you mention but deviate in one important (and incredibly useful) aspect that allows them to scale well across a codebase: once a refinement upon their representation is known, that refinement becomes associated with the value's type and hence no longer requires verifying later on in the program. This is crucially important. It means that if an earlier validation gets removed (or its scope widened) you get a type error at the point of use since the refinement is no longer valid. This massively reduces the mental burden of using them.

Hmmmm, I'm not sure I follow here. Let's see a simple example (assume a pure language with immutable values):

fn f( v: Array<string> | boolean | Event ){
  if( Array.isArray( v ) ){
    return v
  } else if( typeof v === "boolean" ){
    return v
  // v instanceof Event
  } else {
    return v
  }
}
x = true
y = f( x )
f( y )

So based on what you wrote the call f( y ) would be a type error (because it's guaranteed in compile time that y can't be anything else than boolean)? That's actually not my understanding of refinement typing.

If it wouldn't produce any type error, then if we made the scope broader:

fn f( v: Array<string> | boolean | Event ){
  if( ( Array.isArray( v ) ) || ( typeof v === "boolean" ) ){
    return v
  // v instanceof Event
  } else {
    return v
  }
}
x = true
y = f( x )
f( y )

Then by the same logic we'd still not get any type error. So I still do not follow what you meant.

Anyway I'm certain the language with assert sassert and claim satisfying other TLDR criteria is a strict superset of refinement typing.

Thoughts?


I actually didn't plan to lean on concatenative languages in this thread, but it somehow forces me to. That makes me wonder what else I'm putting too low emphasis on with regards to programming languages. Thanks again for your valuable inputs and discussion!

@zesterer speaking of advanced types you might be interested in the language Formality (see the broader scope in https://github.com/Soonad/Whitepaper ).

@zesterer speaking of advanced types you might be interested in the language Formality (see the broader scope in https://github.com/Soonad/Whitepaper ).

Formality seems very nice. I believe it shares a lot of ideas with the language I'm writing. I'm not too keen on some aspects of the syntax, but that's simply a matter of personal preference.