- P0001
- Java…tokenizer, thing
- Status: Some unit tests that pass
- P0002
- C# even-more-simplified interpreter
- Status: Works; used to implement parts of MapTranslator35
- P0003
- Test cases
- P0004
- Some compiler written in Janet, or something
- Status: ???
- P0005
- A minimal lisp, implemented in Java
- Status: incomplete
- P0006
- Tiny single-class VM challenge
- Status: Works, but minimally useful
- A forthlike VM using String[] to store the program
- P0007
- Demonstration of dynamically generated/loaded class files
- Status: Concept proved to work.
- P0008
- ‘Java Command Runner 36’
- Exists both separately and as a subproject of TScript34 because I can’t make up my mind which it should be. Currently the standalone project is ahead, and has a cleaner history than the p0008/master branch in this repository, so probably use that.
- P0009
- Single-class Java implementation of TScript34.2 interpreter
- Program encoded as int[] with a table of Object[] constants
- Stack is Object[], with Object[] { SPECIAL_MARKER, tag, … } used to denote non-literal values (i.e. values represented in an indirect way, e.g. ‘value of expression X’, ’
- Status: Concept proved, has some unit tests.
- May be expanded upon to provide more operations, though I would still like to keep the core very small, so need to come up with some extension mechanism.
- P0010
- Java classes to formalize schemas for ‘indirect value’ / ‘thunk’ / ‘meaning-tagged value’ objects
- P0011
- Hopefully-more-minimal-than-P0005 lisp-in-Java, because I still want one, and P0005 got bogged down with continuation stuff.
- P0012
- Proof-of-concept Maven library and thing to use it
- P0013
- CPS-based interpreter demo
- P0014
- TEF parser java library
- P0015
- RDF parser java library
- P0016
- An implementation of Lox
- P0017
- A design for a hybrid command/functional language
- P0018
- Java library to help publish Maven packages
- P0019
- A TScript34.2 interpreter based on P0009
- P0020
- Process[Like] management experiment in Java
- 1.3.6.1.4.1.44868.261.34
- This project
- 1.3.6.1.4.1.44868.261.34.n
- Sub-project
t
- 1.3.6.1.4.1.44868.261.34.10.t
- Indirect value representation
t
; see ./P0010/src/main/java/net/nuke24/tscript34/p0010/IndirectValueTags.java for the list of tags!
I should probably specify it, huh? With test cases and stuff.
Well, maybe for now it’s enough to say that the language is intended to be compatible with PostScript. i.e. a program that is valid in either PostScript or in TScript34 should mean the same thing and have the same result in either. That said, some PostScript programs might not be valid TScript34 programs, and vice-versa.
Languages aside from the ‘syntaxless’ TS34.2 (which is line-based) and PostScript clones (though they may extend the syntax to support line comments) may share common tokenization rules.
#!/shebang/line # line comment #SPECIAL-DIRECTIVE foo-bar:baz/quux#quuux # Bareword, includinbg some punctuation characters; # Note that '#' only starts a line comment when precedded # by whitespace or the beginning of a line [abc 123] # Square braces are self-delimiting: `[` `abc` `123` `]` (asd 123) # So are parentheses {asd 123} # And so are curly braces, except in TCL-like languages, # where they act like nestable double quotes. # Single and double quotes follow the same tokenization rules 'quoted symbol\n' # Single quotes mean 'treat as a symbol' # (except in Lispy languages, where 'foo means (quote foo) "quoted string\n" # Double quotes mean literal string ‹hello \ ‹there›› # Nestable symbol quoting without escapes «hello \ «there»» # Nestable literal quoting without escapes
‹›
and «»
are called ’guillaments’.
The single and double regular and nestable quotes are the same characters with the semantics as defined by the TOGVM-PHP language and SchemaSchema. Other unicode quotes might allow nesting with escape sequences, or other permutaions of nestable/escapable/supporting interpolations or not (see https://github.com/TOGoS/TOGVM-Spec/blob/master/test-vectors/tokens/quotes.txt).
However, that seems to lead to some ambiguity: at which level are the escapes decoded? The answer is probably: at the outermost quotation, since that is the most straightforward. But that might seem surprising and/or not the most useful interpretation to someone writing with them. Therefore I am punting by simply disallowing them, for now. The following quote characters should be reserved; i.e. recoignized but unsupported (for now):
`backticks` ‘nestable single quotes’ “nestable double quotes” 「Japanese single quote」 『Japanese double quote』 〈Japanese angle quote〉 《Japanese double-angle quote》 【Whatever this is】 〔This other one〕 〖More crazy unicode quotes〗 〘Yet more of them!〙 〚Holy crap, so many weird quote characters〛
(the last few were simply copied from https://en.wikipedia.org/wiki/CJK_Symbols_and_Punctuation for completeness; I have never thought about using them or what they would mean)
A collection of projects, some entirely experimental, that are vaguely related in that they share the goal of defining minimal, cross-platform programming language interpreters, VMs, or compilers.
Some of the sub-projects attempt to define or implement a small PostScript-based language specification.
The goal is to have a very easy-to-implement cross-platform core that can bootstrap nicer languages (e.g. scheme, more fleshed-out PostScript, etc).
Being a concatenative stack-based language means very little ‘parsing’ is needed; tokens are tokenized and fed directly to the interpreter.
Feel free to implement higher-level languages using TScript34. Actually that’s kind of its purpose.
PostScript seems like a more elegant language than Forth, with ‘{ procedures }’ as first-class objects, somewhat more conventional operation names, symmetrical string syntax ‘(foo)’ instead of ‘” foo”’, and fewer assumptions that it is running very close to the metal.
Because I want them to be ‘stackless’.
(See https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/)
Basically I was bitten by the continuation-passing-style bug long long ago, and find the idea of a parsing function taking a whole thread hostage distasteful.
Why?
- A non-IO function blocking on I/O breaks the single-responsibility principle; now callers need to know not only your functional API, but also the blocking behavior of I/O streams
- Not relying on any given I/O system makes processing functions more generally useful
The ‘Danducer’ pattern breaks all stream-processing routines down into state machines that ‘never block’ (except to do computation), but only handle input by returning output, an updated version of themselves, and whether they are waiting for more input.
Might be slightly less ‘minimal’ than what I’m going for, here, though admittedly I haven’t tried it.
TODO: Read https://www.javaadvent.com/2022/12/webassembly-for-the-java-geek.html
It is compelling.
The Uxn/Varvara ecosystem is a personal computing stack based on a small virtual machine that lies at the heart of our software, and that allows us to run the same application on a variety of systems.
Sounds very similar to what I’m going for, so why not!
He’s the author of PuTTY. He talks about what I call ‘the reader-writer problem’ and how coroutines solve it in ’use cases’
Some guy on HN seems to be after something similar
I’ve been working on something centered around extensibility, or metaprogramming, coming from a strictly imperative angle, with the belief that anything else (functional, relational/logic based, whatever) can be built on top of that.
A few guiding principles are:
- simplicity above all, with as few fundamental elements as possible
- the parser is a separate issue, just write your own syntax to avoid the most divisive bikeshed element of PL design, or pick the C like or ALGOL like one out of the box. You very likely want your own syntax anyway as you write extensions.
- every language element, from modules down to function calls, are first class, ie have an (implementing) type, can be stored in variables and used in expressions, be introspected and evaluated/deployed.
- runs at compile time, compiles at run time (code generation/partial evaluation/dynamic code)
- generates C, Java, Python and various bytecodes to maximise interoperability, code availability and deployability
- has no standard runtime or standard library of its own, is entirely parasitic on other environments
Even if it ends up being completely useless, it’s a really interesting exercise in design.
Candy - a functional language with assertions in place of types
Seems similar to what I was thinking w.r.t. a scheme-like where you could define constraints like so:
(define (divide a b)
(assert (is-number a))
(assert (is-number b))
(assert (is-nonzero b))
(...logic to do the division here))
That said, not sure if it follow the other principles I have in mind about there being no types distinct from behavior. The README indicates there are some ‘predefined types’, such as int, text, list, struct. Can I define my own thing that ‘looks like’ a list?
(My current thinking is that lists should just be values that can be ~car~red and ~cdr~ed and ~cons~ed.)
Rye - A mostly-pure, low-syntax homoiconic scriping langyuage
Relevant to my thought that we can “just use monads” for I/O:
These are not the only language features that can be used to model effects, and other features also fall into one of these buckets. For example, monads are also statically typed and lexically scoped. However, a major objection to monads is that they model effects in a specifically layered way, so that for example there is a distinction between an IO<Result<T, E>> and a Result<IO<T>, E>. Coroutines on the other hand are order-independent: all coroutine that yield Pending and Exception have the same type, there is no distinction of order. The same is true of effect handlers.
A Forthlike language with some interesting ideas.
$foo
defines (in a lexical scope delimited by..parens, I think)
the name foo
to mean the thing on top of the stack.
~’foo~ quotes the symbol, and
^foo
puts the thing identified by foo
onto the stack instead of executing it.
~’foo pop~ and ~’foo push~ store and load the value defined by foo
, respectively.