______ _ _
| ____| | | | |
___ | |__ ___ _ __| |_| |__
/ _ \ | __/ _ \| '__| __| '_ \
| __/ | | | (_) | | | |_| | | |
\___| |_| \___/|_| \__|_| |_|
_____ ______ ___ ___ ___ ___ ___ _______ ________
|\ _ \ _ \|\ \|\ \ |\ \ / /|\ \ |\ ___ \ |\ __ \
\ \ \\\__\ \ \ \ \\\ \ \ \ \/ / \ \ \ \ \ __/|\ \ \|\ \
\ \ \\|__| \ \ \ \\\ \ \ \ / / \ \ \ \ \ \_|/_\ \ \\\ \
\ \ \ \ \ \ \ \\\ \ / \/ \ \ \____\ \ \_|\ \ \ \\\ \
\ \__\ \ \__\ \_______\/ /\ \ \ \_______\ \_______\ \_____ \
\|__| \|__|\|_______/__/ /\ __\ \|_______|\|_______|\|___| \__\
|__|/ \|__| \|__|
This project encompasses an assembler for a SUBLEQ CPU variant called MUXLEQ, a virtual machine built upon that assembler capable of running Forth, and a cross-compiler designed to target the Forth VM, based on the eForth family of the Forth programming language. The system is self-hosted, allowing users to create new and modified systems seamlessly.
SUBLEQ is an esoteric and impractical single-instruction CPU, yet it achieves Turing-completeness, demonstrating that even with such a limited instruction set, it can perform any computable task given sufficient memory and time. Porting a Forth implementation to SUBLEQ serves as a testament to its versatility —- if you can adapt Forth to run on SUBLEQ, you can port it to virtually any platform. There is a saying about Forth: "Forth is Sudoku for programmers." This aptly captures an intricate and satisfying relationship with the language and the project, serving as an experimental platform for demonstrating execution of high-level programming languages with minimal effort.
This setup requires the installation of a C compiler, Gforth, and GNU Make.
- macOS:
brew install gforth
- Ubuntu Linux / Debian:
sudo apt-get install gforth build-essential
Certainly! Here's the proofread and refined version of your text, optimized for inclusion in a user manual:
To run eForth on MUXLEQ, simply type:
$ make run
Below is an example session demonstrating basic usage:
words
21 21 + . cr
: hello ." Hello, World!" cr ;
hello
bye
This allows you to operate eForth within the system. For a list of available
commands, enter words
and press Enter. In Forth, the term "word" refers to
a function. It is called a "word" because Forth functions are typically named
using space-delimited characters, often forming a single, descriptive term.
Words are organized into vocabularies, which collectively make up the dictionary
in Forth. Numbers are input in Reverse Polish Notation (RPN); for instance,
inputting:
2 2 + . cr
will display 4
.
To define a new function, use the following format:
: hello cr ." Hello, World!" ;
Remember that spaces are critical in eForth syntax. After defining a function,
simply type hello
to execute it.
The system is self-hosting, meaning it can generate new eForth images using
the current eForth image and source code. While Gforth is used to compile the
image from muxleq.fth
, the Forth system's self-hosting capability also allows
for building new images after modifying any Forth source files. To initiate
self-hosting and validation, run make bootstrap
.
The MUXLEQ architecture is an enhancement of the classic SUBLEQ one-instruction set computer (OISC), offering an additional instruction that improves performance and reduces program size. The MUXLEQ system retains simplicity, making it nearly as straightforward to implement in hardware as SUBLEQ. Existing SUBLEQ programs typically run on MUXLEQ without alterations.
Below is the pseudo code for the MUXLEQ variant:
while pc >= 0:
a = m[pc + 0]
b = m[pc + 1]
c = m[pc + 2]
pc += 3
if a == -1:
m[b] = get_byte() # Input a byte to memory at address b
elif b == -1:
put_byte(m[a]) # Output the byte stored at memory address a
elif c != -1 and c < 0:
m[b] = (m[a] & ~m[c]) | (m[b] & m[c]) # Multiplex operation
else:
r = m[b] - m[a] # SUBLEQ subtraction
if r <= 0:
pc = c # Branch if the result is less than or equal to zero
m[b] = r
Removing the elif c != -1 and c < 0:
clause effectively reverts MUXLEQ to a
typical SUBLEQ machine, as this conditional handles the multiplexing logic unique
to MUXLEQ.
SUBLEQ machines belong to a class called OISC, which uses a single instruction to perform any computable task, albeit inefficiently. SUBLEQ originated from the "URISC" concept introduced in the 1988 paper URISC: The Ultimate Reduced Instruction Set Computer. The intent was to provide a simple platform for computer engineering students to design their own instruction sets and microcode. SUBLEQ falls under arithmetic-based OISCs, in contrast to other types like bit-manipulation or Transport Triggered Architectures (MOVE-based). Despite its simplicity, SUBLEQ could be made more efficient with the addition of just one extra instruction, such as NAND or a Right Shift.
A single SUBLEQ instruction is structured as follows:
SUBLEQ a, b, c
Since SUBLEQ is the only available instruction, it is often abbreviated simply as:
a b c
The three operands a
, b
, and c
are stored in three consecutive memory
locations. Each operand represents an address in memory. The SUBLEQ instruction
performs the following operations in pseudo-code:
r = m[b] - m[a]
if r <= 0:
pc = c
m[b] = r
There are three notable exceptions to the standard operation:
- Halting Execution: If the address
c
is negative or refers to an invalid memory location outside the addressable range, the program halts. - Input Operation: If the address
a
is-1
, a byte is read from the input and stored at addressb
. - Output Operation: If the address
b
is negative, a byte from addressa
is sent to the output.
The SUBLEQ specification does not dictate how numbers are represented. Key considerations include:
- Bit Length: Numbers can be 8-bit, 16-bit, 32-bit, 64-bit, or even arbitrary precision.
- Negative Numbers: Typically implemented using two's complement, but other methods like sign-magnitude can also be used.
The simplicity of the MUXLEQ design allows for further optimizations by packing additional functionality into the instruction set. Some possible variants include:
- Bit Reversal: The multiplexed value could have its bits reversed during the
operation. This functionality could be integrated into the
mux
instruction, allowing for efficient bit manipulation. - Right Shift: Incorporating a right shift operation, even by just one position, would significantly enhance the arithmetic capabilities of the system.
- Comparison, the result of comparing
m[a]
andm[b]
could be stored inm[a]
, useful comparisons would beis zero?
, signed or unsigned less than or greater than. Any of those five would be useful. - The paper "Subleq: An Area-Efficient Two-Instruction-Set Computer"
introduces bit-reversal to enhance hardware efficiency. When the c operand is
negative, the operation
r = reverse(reverse(m[b]) - reverse(m[a]))
is performed, wherereverse
flips the bits of a value. This modification turns the "less than or equal to zero" branch into a branch on evenness, allowing an efficient right shift with minimal additional hardware resources.
Although the current MUXLEQ variant already demonstrates considerable improvements over SUBLEQ, there are a few missing features, such as the "self-interpreter" feature found in some extended SUBLEQ variants. With further effort, additional performance gains and capabilities could be implemented, further closing the gap between minimalistic architecture and more conventional designs.
The image is a variant of Forth known as "eForth." This implementation of eForth differs from standard ANS Forth implementations, notably lacking constructs like the "do...loop" and its variants. The concept behind eForth was to develop a system that required only a small set of primitives, around 30, written in assembly to create a highly portable and reasonably efficient Forth. Bill Muench originally developed eForth, with later enhancements made by Dr. Chen-Hanson Ting.
muxleq.fth
functions as both a cross-compiler and an eForth interpreter,
specifically designed for MUXLEQ. Written entirely in Forth, muxleq.fth
has
been verified for compatibility with Gforth and can also be executed using
a pre-generated eForth image running on a MUXLEQ machine.
The cross-compilation process functions as outlined below:
- Assembler: A specialized assembler for the MUXLEQ architecture enables low-level machine code generation tailored to the instruction set.
- Virtual machine: Leveraging the MUXLEQ assembler, a virtual machine is constructed. This VM is capable of supporting higher-level programming constructs, facilitating the seamless execution of Forth code within the MUXLEQ environment.
- Forth word definitions: These definitions are instrumental in building a full-fledged Forth interpreter, allowing for the creation, compilation, and execution of Forth programs.
- Forth image: The finalized Forth image, encapsulating the interpreter and its environment, is output to the standard output stream. This image initializes the VM with the necessary configurations and word definitions to operate effectively.
"Meta-compilation" in Forth refers to a process similar to cross-compilation, though the term carries a distinct meaning in the Forth community compared to its use in broader computer science. This difference stems from Forth's evolution within the microcomputer scene, which was separate from the academic environment of the 1980s and earlier. The term "meta-compilation" may have been somewhat mistranslated. While most modern programs employ unit testing frameworks, here, a meta-compilation system serves as an extensive testing mechanism. If the system compiles image "A," which can then compile another image "B," and "B" matches "A" byte-for-byte, this gives reasonable confidence that the image is correct, or at least correct enough for self-compilation.
This process is performed with the following commands:
$ gforth muxleq.fth > stage0.dec
$ sed 's/$/,/' stage0.dec > stage0.c
$ cc -o muxleq muxleq.c
$ ./muxleq < muxleq.fth > stage1.dec
$ diff -w stage0.dec stage1.dec
The stage0.dec
image was initially generated using Gforth to create the first
functional eForth for the MUXLEQ machine. Once the eForth image is ready, it
serves as the meta-compiler, capable of compiling itself and generating
stage1.dec
. The image generated by Gforth should be identical to the one
produced by MUXLEQ eForth when using the same muxleq.fth
file. If the two
images are identical, the bootstrapping process is considered complete. Although
the Gforth interpreter is no longer required, it is retained because it is
significantly faster than using MUXLEQ eForth to compile a new image.
It is noteworthy that approximately half of the memory allocated is dedicated to
the virtual machine, which facilitates the writing and execution of Forth code.
The BLOCK
word-set within this implementation does not interact directly with
mass storage. Instead, it maps blocks to memory, enabling efficient memory
management and access.
This package is released under the Public Domain and was initially written by Richard James Howe.