hoc: A C repository from phillbush

                                  hoc:
                         High-Order Calculator

This is an implementation of a stage-6 hoc(1) with added features.
Hoc (High-Order Calculator) is an interpreter for a simple language
used for floating point arithmetic with C-like syntax.

§ FILES

• README:       This file.
• Makefile:     The makefile.
• code.[hc]:    Routines for executing the machine instructions.
• error.[hc]:   Routines for error printing.
• main.c:       The main routine.
• lex.l:        The lexical analyzer.
• gramm.y:      The grammar.


§ USAGE

	$ hoc [file [arguments ...]]

Hoc reads the file given as argument and interpret it.  If no file is
given, or if file is “-”, hoc interprets the standard input; in this
case, I recommend running hoc with rlwrap(1), for a better interactive
shell with history support.


§ TODO

• Add a facility to execute system commands from within hoc (and assign
  their return value to variables) (exercise 8-7).
• Add interrupt handling, so that a runaway computation can be stopped
  without losing the state of variables already computed (exercise 8-16).
• Add arrays to hoc.  Pass they by reference to function and procedures.
  Return a pointer to them (exercise 8-20).
• Add string concatenation.
• Add option -e to read code from command-line.
• Read environment variables by a getenv() built-in function.


§ FEATURES

Exercise 8-2 (modulus and unary +).
This version of hoc(1) supports the modulus operator (%) and the unary
plus operator (+).

Exercise 8-3 (printed value).
The previously printed value can be accessed using the special variable
`.` (period).

Exercise 8-4 (semicolon).
This version of hoc(1) supports both semicolon and newlines as statement
separators.  (See the `term` production rule on gramm.y).

Exercise 8-5 (prohibit assignment to constants).
In this version of hoc(1), constants such as pi are implemented as
nullary built-in functions.  So, instead of using `PI`, you must use
`pi()`.  Since built-in functions cannot be assigned, this prohibit
assignment to constants.

Exercise 8-6 (n-ary functions).
This version of hoc(1) supports built-in functions with variable arity.
It also has the built-in function rand(), which returns a floating point
random variable uniformly distributed on the interval (0,1).

Exercise 8-8 (table of built-in functions).
This version of hoc(1) uses a table instead of a set of essentially
identical functions for implementing the built-in functions.

Exercise 8-10 (dynamic machine).
The sizes of `stack` and `prog` are dynamic, so hoc never runs out of
space as memory can be obtained by calling malloc(3).

Exercise 8-12 (debug).
By compiling with -DDEBUG=1, the machine instructions hoc generates are
printed in a readable form for debugging.

Exercise 8-13 (assignment operators, short-circuit).
This version of hoc(1) supports the assignment operators of C, such as
+=, *=, etc, and the increment and decrement operators ++ and --.   It
also supports short-circuit evaluation of the && and || operators,  so
they guarantee left-to-right evaluation and early termination, as in C.

Exercise 8-14 (for loops).
This version of hoc(1) supports for loops, and break and continue
statements.  Each element of the for loop (pre-loop; condition;
post-loop) can be omitted, as in C.  An omitted condition implies
in an infinite loop.

Exercise 8-15 (comments).
This version of hoc(1) supports comments beginnign with “#”.

Exercise 8-18 (named formal parameters).
This version of hoc(1) supports named formal parameters in subroutines
as an alternative to $1, etc.

Exercise 8-19 (local variables).
This version of hoc(1) supports local variables by the same inelegant
way that awk(1) does.

Exercise 8-21 (string handling).
This version of hoc(1) supports generalized string handling, so that
variables can hold strings instead of numbers.  (I have to add a
concatenation operand though).  It also has facilities for output
formatting (the printf statement).

Lex.
This version of hoc(1) uses lex(1) for implementing the lexical analyzer.

Expression list.
This version of hoc(1) supports list of expressions separated by comma,
for example in the for condition `for (i = 0, j = 1; i < 3; i++, j++)`.
The value of a expression list is that of the rightmost expression.

print, printf and sprintf().
The `print` statement in this version of hoc actually works like
awk(1)'s `print` statement (separates each argument with a space,
and prints a newline at the end).  As awk(1), this version of hoc(1)
also has a `printf` statement and a `sprintf()` built-in function.

getline
The `getline VAR` expression reads a line into the variable VAR.

do-while.
This version of hoc(1) supports do-while statements.

Access command-line arguments.
This version of hoc(1) can access command-line arguments with the '$'
operator.  $1 is the first command-line argument, $2 is the second,
and so on.  $0 is the name of the input file.

Escape line break.
This version of hoc(1) can escape new lines by ending a line with a
backslash (\).


§ NON-FEATURES

Exercise 8-11 will not be implemented.  It requires using a switch on
the type operation instead of calling functions from a function pointer.
Using function pointer is more elegant and easier to maintain than using
switch cases.

Exercise 8-17 will not be implemented.
If you want editing features use a shell wrapper such as rlwrap(1).

Reading from multiple files will not be implemented.
Use cat(1).


§ HOW IT WORKS

Hoc compiles input into a stack machine.   As input is parsed, code
is generated for a simple computer instead of immediately computing
answers. Once the end of a statement is reached, the generated code
is executed (interpreted) to compute the desired result. The simple
computer is a ‘stack machine’: when an operand is encountered, it's
pushed onto a stack  (more precisely,  code is generated to push it
onto a stack).  Stack machines usually result in simple interpreters
(it's just an array containing operators and operands).  The operators
are the machine instructions; each is a function call with its
arguments, if any, following the instruction.

Before parsing and execution begins, the machine is initialized and the
symbol table is populated by init().  Calls to the function install()
installs both keywords and built-in functions in the symbol table.

The main loop reverts the stack machine to its initial state and parses
the input, one statement at a time.  While the input is parsed, code is
generated for later execution by calls to the function `code()`,  which
simply puts an `Inst` data  (see bellow) into the next free spot in the
program memory (pointed by `prog.progp`).   Once a statement is parsed,
the generated code is printed if DEBUG is set and then executed.

For example, to handle the assignment `x = 2 * y`, the following code
is generated.  When this code is executed, the expression is evaluated
and the result is stored in x.  The final `pop` clears the value off
the stack because it is not needed any longer.

	OPERATION: constpush    Push a constant onto stack
	VALUE:     2            … the constant 2

	OPERATION: varpush      Push symbol table pointer onto stack
	SYMBOL:    y            … for the variable y
	OPERATION: eval         Evaluate (replace pointer by value)

	OPERATION: mul          Multiply top two items; product replaces them

	OPERATION: varpush      Push symbol table pointer onto stack
	SYMBOL:    x            … for the variable x
	OPERATION: assign       Store value in variable, pop pointer

	OPERATION: pop          Clear top value from stack
	OPERATION: STOP         End of instruction sequence

The machine itself is a list of entries of data of type `Inst`.   An
`Inst` is a union of stuff that can go into the memory; such as pointer
to routines like `mul` that perform an arithmetic operation; or pointer
to a entry in the symbol table; or a floating point number (for constant
values); or a pointer to another entry in the machine (used by control
flow statements), a string, etc.

Execution of the machine is simple.  Each cycle calls `execute()`,
which executes the function pointed to by the instruction pointed
to by the program counter `prog.pc`, and makes `prog.pc` point to
the next instruction so it's ready for the next instruction.   An
NULL instruction terminates the execution. Some instructions also
increment `prog.pc` to step over any arguments that follows the
instruction.

The code generated for while and if needs particular study.  When the
keyword `while` is encountered, the operation whilecode() is generated,
and its position in the machine is returned as the value of the while
production.  At the same time, however, the two following positions in
the machine are also reserved, to be filled in later.   The next code
generated is the expression that makes up the condition part of the
`while`.  After the whole `while` statement has been recognized, the
two extra positions reserved after the `whilecode` instruction are
filled with pointers to the locations of the loop body and the statement
that follows the loop.  Code generated for `for` and `if` works similar.

You can compile with -DDEBUG=1 for hoc to print the generated machine
code after it is generated.

Strings used as values are allocated in two linked-lists of strings.
`autostrings` contains strings that appears during evaluation as the
result of an expression.  Strings in `autostrings` are freed after each
execution.  `finalstrings` contain strings that are the value of a
variable, and thus should not be freed.

Different from the book, where symbols are used for variables, keywords,
built-in functions, etc, in this implementation there is two different
structures, Name and Symbol.  The first is looked up while the input is
being read, while Symbols are looked up during execution.

Names for variable, functions and keywords are installed in a name
table.  There is a global name table, used for keywords, function
names and global variable names; and a name table for each function
definition, used for local variables.

When a variable is evaluated, it is looked up in a symbol table, which
associate names with values.  There is a global symbol table, for global
values; and one symbol table for each function frame, containing
automatic values.


§ SEE ALSO

The UNIX Programming Environment,
by Brian W. Kernighan and Rob Pike,
Prentice Hall, 1984.
ISBN: 0-13-937681-X.
phillbush/hoc