A Forth dialect that focuses on portability and simplicity.
The Diatom assembler (dasm
) is a simplistic assembler that
translates DiatomVM instructions (.dasm
) to their respective
opcodes (.dopc
).
Comments are started with (
and ended with )
. Please keep in
mind that both parantheses need to be surrounded by spaces or
beginning/end of line:
( This is a comment )
( The next function duplicates
a given number. )
:duplicate
dup
+
ret
The assembler supports basic labels and references (backwards & forwards references) that can be used to reference to specific memory locations in your program.
Labels are created with :<name>
and can be referenced with
@<name>
. The reference inserts the memory address of the label
as literal value:
( Adds the address of the label to the stack. )
:add-own-address ( address = 125 )
const
@add-own-address
ret
would resolve to
const
125
red
Upon start the DiatomVM will begin executing instructions from the first memory address. To skip directly to the entry point of your program you can use a forward reference to a label:
( Immediately jump to the start label )
const
-1
cjmp
@start
( Other code that shouldn't be executed right away )
...
( Entry point of my program )
:start
...
The assembler has a couple of built-in macros that are expanded as the first step in the assembly process. Most of them are designed to easy the implementation of Forth-like languages but can also be used in other contexts.
-
Literals
Literal numbers are split into consecutive bytes by the assembler. The number of bytes depends on the configured word size (default = 32 bit = 4 bytes).
const 500
will be expanded to
const 0 0 1 244
-
.const
The
.const
macro defines a constant value that cannot be changed at runtime. The generated instructions will be appended to a Forth-like dictionary but the code can also be called directly.Syntax:
.const <name> <numeric-value | label> .end
.const max-char 255 .end
will be expanded to
:max-char 0 8 109 97 120 45 99 104 97 114 :_dictmax-char const 0 0 0 255 ret
Calling
max-char
will put the value255
on the data stack of the virtual machine:call @_dictmax-char
-
.var
Variables can be defined with the
.var
macro. In contrast to constants, a variables value can be changed at runtime.Syntax:
.var <name> <numeric-value | label> .end
.var base 10 .end
will be expanded to
:base 0 4 98 97 115 101 :_dictbase const @_varbase ret :_varbase 0 0 0 10
Calling
base
will put the address of the variable's memory location on the data stack. This address can then be used to read (@
) or update (!
) the variable's value. -
.codeword
Codewords are very specific to Forth-like languages. Codewords are essentially functions (a.k.a. words) consisting of assembly instructions that are registered in the Forth dictionary.
Syntax:
.codeword <name> <instructions> .end
The codeword macro also supports a shorthand for calling other words from the dictionary with
!<word-name>
..const 1k 1000 .end .codeword add-1k !1k + .end
will be expanded to
( 1k const ) :1k 0 2 49 107 :_dict1k const 0 0 3 232 ret ( add-1k word ) :add-1k @1k 6 97 100 100 45 49 107 :_dictadd-1k call @_dict1k + ret
Name | Length [Word] | Comment |
---|---|---|
previous | 1 | Link to the previous word (or NULL if first word). |
len | 1 | Length of the name. |
name | n | Name of the word (1 char / word). |
code-word | 1 | Pointer to the first word to execute. |
Diatom code that needs to be executed on every startup to properly initialize the memory space. The following tasks need to be executed:
- Init dictionary pointer
dp
- Add native words to dictionary (add, dup, @, !, etc.)
- Execute main word (default = REPL but overwritable)
Hosted on top of some other programming language (e.g. C or Javascript) it is pretty unlikely to meet the performance of the host language. The only thing one can do is trying to minimize the call overhead of words as much as possible because this could still be cheaper than constructing a complete stack frame for a host language function call.
Apart from this the biggest performance improvement can probably be achieved not on a instruction level but for whole applications by keeping them as simple and lean as possible. The assumption is, that e.g. a lean Diatom application can still be faster (by doing much less) then a comparable Java application following "best practices". Here any performance would of course not be gained by just using another language but by applying different design principles that focus on the core functionality.
To make the implementation as portable as possible the native Forth words are not implemented in the host language directly but in a virtual stack machine that is suited for running stack-based languages.
This way, only the virtual machine primitives need to be implemented in the host language and nothing else. The drawback is, that it takes longer to get to the first running implementation because additional custom software (e.g. assembler) is needed for bootstrapping.
- Exception support?
- Tail-recursive looping
- Integer overflow trapping vs. saturation?