zilch-lang/nstar

Rework N*

Closed this issue · 2 comments

N* is expected to be a low-level abstract assembly language, i.e. it should be able to run on most architectures without major modifications (mainly related to calling conventions) to the code.
Because of this, it is mandatory to abstract away some specific details.

At the moment, N* is simple enough to allow a subset of C to compile to it. However, things get really complicated when we consider more complex features, e.g. higher order functions.

Currently (speaking as of the first version), N* disallows manipulating code addresses (basically function entry points). Higher order functions (or pointer to instructions, as used in C) cannot be expressed if this limit exists.
The branch-checker (which is supposed to check whether jumps are safe etc) currently works under the assumption that this is impossible to manipulate code addresses, therefore cannot work if we allow jumping to un-trusted code via a register (which is in fact useful when working with lambdas, higher order functions in general, making a JIT compiler, etc).

Looking back to FunTAL, they achieve some sort of “jump safety” using continuations.
While this works, I'd like to be a little less restrictive, and allow jumping back to an address on the top of the stack (which in fact serves the same purpose, just in a different way).
There are also some design problems that I'd like to fix by re-designing the language itself.


This second version of N* should:

  • allow jumping/pushing/popping code addresses (e.g. found in registers, from registers to other registers, etc)
  • get rid of the branch-checker which is basically meaningless if we allow manipulating code addresses
  • have some syntactical changes
  • allow handling some labels as "inner" labels, allowing for type-safe fallthroughs
    This won't be considered because it is not needed. We just need some notion of scope for instructions to "live in".

The new syntax will be inspired from FunTAL (at least a bit), while still being simple enough to be used and understood by a human-being (N* will be used in the Zilch project as a compiler back-end, but it has to be able to be used as a standalone programming language).
It could look like:

section data {
  x: u64 = 0xDEADBEEF
}

section code {
  main: ∀(σ: Ts).{| (∀().{ %r0: u64 | σ })∷σ } = [
    mov 0, %r0
    ret
  ]
}

This example is of course just a draft, and will most likely be changed in the future.


EDIT: #50 (comment)

  • Allowing indexing the stack using integer numbers (where 0 maps to the entry on top of the stack, 1 the second entry in the stack, etc)
  • Extending the stack index feature to also work for continuations: → 0 means that the continuation lies on top of the stack when escaping from the current scope
  • Those = [ i... ] are ugly when defining code objects (“functions”); using the same syntax as FunTAL (I ::= i; I | ret | other terminal instructions... where i is a simple instruction) seems to be a much better choice
  • Maybe extend the indexing feature to arrays, therefore introducing arrays in N*. This will most probably be hard to use (e.g. using VLAs without some sort of variable) but can be worth having

FunTAL (the TAL part) uses continuations to "return" values from functions, using the end continuation to exit the program.
This is a good idea, because it makes jumps safe (no need to compute the new context after a call, which cannot be always determined unless we can't manipulate code addresses).

The stack seems to also be completely infered in FunTAL (which doesn't change much, mainly only one forall bind).

Considering those aspects, I propose this syntax for the example above:

section data {
  x: u64 = 0xDEADBEEF
}

section code {
  main: ∀(ϵ).{%r5: ∀().{%r0: u64 | σ → ϵ} | σ → %r5} = [
  #       ^ we can also add `σ` here
    mov 0, %r0
    ret
  ]
}

∀(ϵ).{%r5: ∀().{%r0: u64 | σ → ϵ} | σ → %r5} means:

  • the type is generic in the continuation ϵ of the function pointer stored in %r5 (functions are automatically boxed)
  • it works with any stack, but also doesn't know anything about the stack
  • to be able to jump to this function, we need to have a code address in %r5 (how can this be extended to work with the stack?)
  • to jump to the continuation (using ret), we have to set %r0 to an unsigned 64 bit integer

And we also no longer need our branch-checker! 🎉 (changing the return continuation does not prevent the type-checker from working, compared to before where we needed to compute the return context after a call)

Taking a deeper look into FunTAL (which is a very good resource, but a bit hard to understand), there are a few things that I would also like to add to N*, namely:

  • Allowing indexing the stack using integer numbers (where 0 maps to the entry on top of the stack, 1 the second entry in the stack, etc)
  • Extending the stack index feature to also work for continuations: → 0 means that the continuation lies on top of the stack when escaping from the current scope
  • Those = [ i... ] are ugly when defining code objects (“functions”); using the same syntax as FunTAL (I ::= i; I | ret | other terminal instructions... where i is a simple instruction) seems to be a much better choice
  • Maybe extend the indexing feature to arrays, therefore introducing arrays in N*. This will most probably be hard to use (e.g. using VLAs without some sort of variable) but can be worth having