Core language
joelburget opened this issue · 1 comments
I think it's important to have a core language that serves as a good default target.
My initial attempt looks like this:
type core =
(* first four constructors correspond to regular term constructors *)
| Operator of string * core_scope list
| Var of string
| Sequence of core list
| Primitive of primitive
(* plus, core-specific ctors *)
| Lambda of sort list * core_scope
| CoreApp of core * core list
| Case of core * core_scope list
(** A metavariable refers to a term captured directly from the left-hand-side
*)
| Metavar of string
(** Meaning is very similar to a metavar in that it refers to a capture from
the left-hand-side. However, a meaning variable is interpreted. *)
| Meaning of string
This issue is here to braindump some thoughts and references.
- GHC Core:
- Understanding GHC Core
- core syn type
- Implementing functional languages: a tutorial: "The principal content of the book is a series of implementations of a small functional language called the Core language. The Core language is designed to be as small as possible, so that it is easy to implement, but still rich enough to allow modern non-strict functional languages to be translated into it without losing efficiency."
- GRIN:
- Transformation and Analysis of Functional Programs Neil Mitchell's thesis.
- A-Normalization: Why and How
The first question to answer is what exactly is this core language for? It's to unambiguously define the semantics of a language (via translation to core). It's nice if we can do other things like step it with a debugger, but that's secondary.
Two important concerns, fairly unique to this project, are inclusion of terms from other languages and computational primitives.
By "terms from other languages" I mean that denotational semantics (in general / in LVCA) is about translating from language A
to B
. When using core, this is specialized to a translation from A
to Core(B)
, where Core(B)
is core terms with terms from B
embedded. As an example, a case expression in Core(bool)
:
case(
true();
[ true(). false(), false(). true() ]
)
Some of the syntax is up for debate, but the point is that this is the equivalent of (OCaml) match true with true -> false | false -> true
, but where booleans are not built in to core at all, they're from the language embedded in core.
The other concern I mentioned above is computational primitives, by which I mean primitives that are expected to actually do something. For example, you might have primitive #not
, in which case you could write something like the above example as #not(true())
. Here #not
is not built in to the specification of core, but it's provided by the runtime environment. (I'm using a hash to denote primitives, but it's just a convention I think is nice).
With primitives we're now dealing with "core plus", core extended with a set of primitives. So the example #not(true())
is a term in Core(bool()){#not}
. The syntax is complete undecided, but the idea is that this term can be evaluated in any environment that provides the #not
primitive. I think this is really cool. You could easily find the set of primitives your language relies on. It would even be possible to do a translation to a different set of primitives, eg Core(bool()){#not} -> Core(bool()){#nand}
.
Should functions be applied to multiple arguments simultaneously? GHC core only allows a single arg. Mitchell's thesis and GRIN allow multiple args.
Should let-bindings be added as terms? All three I just mentioned have this. By the way, here's a great answer about why Gallina (Coq core) needs let-binding. As far as I can tell, not a concern for us, since we don't plan for our core to be dependently typed.