c4: a portable Forth system inspired by MachineForth and Tachyon

In C4, a program is a sequence of WORD-CODEs.
A WORD-CODE is a 16-bit unsigned number (0..65535).
Primitives are assigned numbers sequentially from 0 to BYE.
If a WORD-CODE is less than or equal to BYE, it is a primitive.
If the top 3 bits are set ($Exxx), it is a 13-bit unsigned literal (0-$1FFF).
If it is greater than BYE, it is the code address of a word to execute.

CELLs in C4

A CELL is either 32-bits or 64-bits, depending on the target system.

Linux 32-bit (-m32): a CELL is 32-bits.
Linux 64-bit (-m64): a CELL is 64-bits.
Windows 32-bit (x86): a CELL is 32-bits.
Windows 64-bit (x64): a CELL is 64-bits.
Development boards: a CELL is 32-bits.

C4 memory areas

C4 provides three memory areas:

The code area can store up to $DFFF 16-bit WORD-CODEs, 16-bit index. (see code-sz).
- NOTE: This is because word-code values $Exxx are considered to be literals (0 .. $1FFF).
- NOTE: CODE slots 0-25 (0 @c .. 25 @c) are reserved for C4 system values.
- NOTE: CODE slots 25-75 (25 @c .. 75 @c) are unused by C4.
- NOTE: These are free for the application to use as desired.
- NOTE: Use @c and !c to get and set values in the code area.
The vars area can store up to CELL bytes, CELL index (see vars-sz).
The dict area can store up to 65536 bytes, 16-bit index (see dict-sz).
here is an offset into the code area.
vhere is an offset into the vars area.
last is an offset into the dict area.
Use ->code, ->vars, and ->dict to turn an offset into an address.

WORD	STACK	DESCRIPTION
(sp)	(--N)	Offset of the parameter stack pointer
(rsp)	(--N)	Offset of the return stack pointer
(here)	(--N)	Offset of the HERE variable
(last)	(--N)	Offset of the LAST variable
base	(--N)	Offset of the BASE variable
state	(--N)	Offset of the STATE variable
(lex)	(--N)	Offset of the LEX (the current lexicon)
(lsp)	(--N)	Offset of the loop stack pointer
(tsp)	(--N)	Offset of the third stack pointer
(reg-base)	(--N)	Offset of the register base
(frame-sz)	(--N)	Offset of the frame-sz value

C4 Strings

Strings in C4 are both counted and NULL terminated.
C4 also supports NULL-terminated strings with no count byte (ztype).

Format specifiers in `ftype` and `."`

Similar to the printf() function in C, C4 supports formatted output using '%'.
For example : ascii dup dup dup ." char %c, decimal #%d, binary: %%%b, hex: $%x%n" ;.

Format	Stack	Description
%b	(N--)	Print TOS in base 2.
%c	(N--)	EMIT TOS.
%d	(N--)	Print TOS in base 10.
%e	(--)	EMIT `escape` (#27).
%i	(N--)	Print TOS in the current base.
%n	(--)	Print CR/LF (13/10).
%q	(--)	EMIT `"` (#34).
%s	(A--)	Print TOS as a string (uncounted, formatted).
%S	(A--)	Print TOS as a string (uncounted, unformatted).
%x	(N--)	Print TOS in base 16.
%[x]	(--)	EMIT [x].

Registers

C4 includes an array of "registers" (pre-defined cells).
The number of registers is configurable (see reg-sz). Default is 255 on PCs.
There is a reg-base that can be used to provide a "stack frame" if desired.
You can create words to reference them frame-sz at a time in a pseudo "stack frame".
Or you can leave reg-base at 0, and reference them individually.
For example: 123 42 reg-s.
Use frame-sz to control stack frame size. Default is 5.
The default bootstrap file creates 5 "registers" for stack frame (see a>, s>, d>, x>, and y>).
C4 provides 8 words to manage C4 registers. They are:

WORD	STACK	DESCRIPTION
`+regs`	(--)	Create new frame; add `frame-sz` to `reg-base`.
`-regs`	(--)	Destroy frame; subtract `frame-sz` from `reg-base`.
`reg-r`	(R--N)	Push register (R + `reg-base`).
`reg-s`	(N R--)	Set register (R + `reg-base`) to N.
`reg-i`	(R--)	Increment register (R + `reg-base`).
`reg-d`	(R--)	Decrement register (R + `reg-base`).
`reg-ri`	(R--N)	Push register (R + `reg-base`), then increment it.
`reg-rd`	(R--N)	Push register (R + `reg-base`), then decrement it.

The Third Stack

C4 includes a third stack, with same ops as the return stack. (>t, t@, t>).
The size of the third stack is configurable (see tstk-sz).
This third stack can be used for any purpose. Words are:

WORD	STACK	DESCRIPTION
`>t`	(N--)	Move N to the third stack.
`t@`	(--N)	Copy TOS from the third stack.
`t>`	(--N)	Move N from the third stack.

C4 WORD-CODE primitives

To add custom primitives, fill in macro USER_PRIMS with X() macros in file c4.cpp.

Stack effect notation conventions:

TERM	DESCRIPTION
SZ/NM/MD	String, uncounted, NULL terminated
SC/D/S	String, counted, NULL terminated
A	Address
C	Number, 8-bits
W	Number, 16-bits
N/X/Y	Number, CELL sized
F	Flag: 0 mean0 false, <>0 means true
R	Register number
FH	File handle: 0 means no file
I	For loop index counter

The primitives:

WORD	STACK	DESCRIPTION
(lit1)	(--W)	W: WORD-CODE for LIT1 primitive
(lit2)	(--W)	W: WORD-CODE for LIT2 primitive
(jmp)	(--W)	W: WORD-CODE for JMP primitive
(jmpz)	(--W)	W: WORD-CODE for JMPZ primitive
(jmpnz)	(--W)	W: WORD-CODE for JMPNZ primitive
(njmpz)	(--W)	W: WORD-CODE for NJMPZ primitive
(njmpnz)	(--W)	W: WORD-CODE for NJMPNZ primitive
(exit)	(--W)	W: WORD-CODE for EXIT primitive
exit	(--)	EXIT word
dup	(X--X X)	Duplicate TOS (Top-Of-Stack)
swap	(X Y--Y X)	Swap TOS and NOS (Next-On-Stack)
drop	(N--)	Drop TOS
over	(N X--N X N)	Push NOS
@	(A--N)	N: the CELL at absolute address A
c@	(A--C)	C: the CHAR at absolute address A
w@	(A--W)	W: the WORD at absolute address A
!	(N A--)	Store CELL N to absolute address A
c!	(C A--)	Store CHAR C to absolute address A
w!	(W A--)	Store WORD W to absolute address A
+	(X Y--N)	N: X + Y
-	(X Y--N)	N: X - Y
*	(X Y--N)	N: X * Y
/	(X Y--N)	N: X / Y (integer division)
/mod	(X Y--M Q)	M: X modulo Y, Q: quotient of X / Y
1+	(X--Y)	Increment TOS
1-	(X--Y)	Decrement TOS
<	(X Y--F)	F: 1 if (X < Y), else 0
=	(X Y--F)	F: 1 if (X = Y), else 0
>	(X Y--F)	F: 1 if (X > Y), else 0
0=	(N--F)	F: 1 if (N=0), else 0
and	(X Y--N)	N: X AND Y
or	(X Y--N)	N: X OR Y
xor	(X Y--N)	N: X XOR Y
com	(X--Y)	Y: X with all bits flipped (complement)
for	(N--)	Begin FOR loop with bounds 0 and N.
i	(--I)	N: Current FOR loop index.
next	(--)	Increment I. If I < N, start loop again, else exit.
unloop	(--)	Unwind the loop stack. NOTE: this does NOT exit the loop.
+regs	(--)	Increment `reg-base` by `frame-sz`
-regs	(--)	Decrement `reg-base` by `frame-sz`
reg-r	(R--N)	Push register (`reg-base`+R)
reg-s	(N R--)	Set register (`reg-base`+R) to N
reg-i	(R--)	Increment register (`reg-base`+R)
reg-d	(R--)	Decrement register (`reg-base`+R)
reg-ri	(R--N)	Push register (`reg-base`+R), then increment it
reg-rd	(R--N)	Push register (`reg-base`+R), then decrement it
>r	(N--R:N)	Move TOS to the return stack
r@	(--N)	N: return stack TOS
r!	(N--)	Set return stack TOS to N
r>	(R:N--N)	Move return TOS to the stack
rdrop	(R:N--)	Drop return stack TOS
>t	(N--T:N)	Move TOS to the third stack
t@	(--N)	N: third stack TOS
t!	(N--)	Set third stack TOS to N
t>	(T:N--N)	Move third TOS to the stack
emit	(C--)	Output char C
:	(--)	Create a new word, set STATE=1
;	(--)	Compile EXIT, set STATE=0
immediate	(--)	Mark the last created word as IMMEDIATE
addword	(--)	-COMPILE: Add the next word to the dictionary
	(--A)	-RUN: A: current VHERE address
timer	(--N)	N: Current time
see X	(--)	Output the definition of word X
count	(SC--A N)	A,N: address and count of chars in string SC
type	(A N--)	Print string at A (counted, unformatted)
ztype	(SZ--)	Print string at SZ (uncounted, unformatted)
ftype	(SZ--)	Print string at SZ (uncounted, formatted)
s-cpy	(D S--D)	Copy string S to D, counted
s-eq	(D S--F)	F: 1 if string S is equal to D (case sensitive)
s-eqi	(D S--F)	F: 1 if string S is equal to D (NOT case sensitive)
z"	(--)	-COMPILE: Create uncounted string SZ to next `"`
	(--SZ)	-RUN: push address SZ of string
s"	(--)	-COMPILE: Create counted string SC to next `"`
	(--SC)	-RUN: push address SC of string
."	(--)	-COMPILE: execute `z"`, compile `ftype`
	(--)	-RUN: `ftype` on string
rand	(--N)	N: a pseudo-random number (uses XOR shift)
fopen	(NM MD--FH)	NM: File Name, MD: Mode, FH: File Handle (0 if error/not found)
		NOTE: NM and MD are uncounted, use `z"`
fclose	(FH--)	FH: File Handle to close
fdelete	(NM--)	NM: File Name to delere
fread	(A N FH--X)	A: Buffer, N: Size, FH: File Handle, X: num chars read
fwrite	(A N FH--X)	A: Buffer, N: Size, FH: File Handle, X: num chars written
fgets	(A N FH--X)	A: Buffer, N: Size, X: num chars read (0 if EOF/Error)
include X	(--)	Load file named X (X: next word)
load	(N--)	N: Block number to load (file named "block-NNN.c4")
loaded?	(W A--)	Stops current load if A <> 0 (see `find`)
to-string	(N--SC)	Convert N to string SC in the current BASE
.s	(--)	Display the stack
@c	(N--W)	Fetch unsigned 16-bit W from CODE slot N
!c	(W N--)	Store unsigned 16-bit W to CODE slot N
find	(--W A)	W: Execution Token, A: Dict Entry address (0 0 if not found)
system	(SC--)	PC ONLY: SC: String to send to `system()`
bye	(--)	PC ONLY: Exit C4

C4 default words

Default words are defined in sys_load(), in file sys-load.cpp. To add default words, add more calls to outer().

CCurl/c4