Unarian (pronounced yoo-NAIR-eein) is an esoteric programming language based on the concept that every Unarian function computes a partial unary function over the natural numbers (hence the name Unarian) and that these functions can only be constructed as combinations of existing functions.
The beauty of this language is in its simplicity. There are only two built-in functions: increment and decrement; only two ways to combine existing functions into new ones: composition and alternation; and effectively only one integer that can be accessed by running programs. Despite this simplicity, Unarian is Turing-complete and capable of representing arbitrary computable functions.
See also the Esolangs page for this language.
This repository contains:
- a short language specification,
- several interesting example programs,
- a simple VS Code extension,
- a minimalistic Rust interpreter,
- an involved Python interpreter,
- and a minimalistic Python interpreter.
Planned additions include:
- a minimalistic C interpreter,
- and a fully-featured Rust interpreter including a custom bytecode format.
Line comments start with #
and are stripped from the source code before parsing. The remainder of the code is split into tokens: strings of arbitrary non-whitespace characters separated from each other by whitespace. Three tokens are considered special keywords: {
(open brace), }
(close brace), and |
(alternation). A few additional tokens represent built-ins: +
(increment), and -
(decrement). Some implementations may also include ?
(input), !
(output), and @
(stack trace) as additional built-ins. All other tokens are considered valid function identifiers.
A Unarian expression is a sequence of alternations |
, built-ins, identifiers, and bracketed groups, where a bracketed group consists of an opening brace {
, an expression, and a closing brace }
. For example, - | + func { - - | } |
is an expression and { - { + func | } + }
is a bracketed group. A Unarian library consists of a sequence of function declarations, where a function declaration is an identifier (the function name) followed by a bracketed group (containing the function definition). Every Unarian source code file defines a library. For example, f { - - | + }
declares a function named f
defined by the expression - - | +
, and the following library defines three functions 0
, if=0
, and main
:
0 { - 0 | }
if=0 { { - 0 | + } - }
main { if=0 + | 0 }
Finally, a Unarian program consists of a library along with an expression, called the entry-point, to be evaluated in the context of that library. By default, the expression main
is considered to be the entry-point, so any library that defines a main
function is also a program.
There are two primary built-ins: increment +
and decrement -
. As their names suggest, increment adds one to its input and decrement subtracts one from its input. However, decrement can fail if applied to input
Some implementations may add additional built-ins such as: input ?
, output !
, and stack trace @
. At the moment, these are non-standard parts of the language and largely used for debugging purposes.
Functions are identified by their name and defined (possibly recursively) by an expression consisting of built-ins, functions, compositions, and alternations. To evaluate a function on input mod2
is defined by the expression - - mod2 |
, then evaluating mod2
on - - mod2 |
on
Composition is one method of combining existing functions to create new ones. It is an associative binary operator over Unarian functions that is comparable to sequential execution (e.g. a; b
) in imperative languages. Syntactically, the composition of functions f
and g
is written as f g
.
Evaluating a composition on input ^2
is a function that squares its input, then ^2 +
maps + ^2
maps - - -
fails on input
Finally, an empty composition is treated as the identity function, which turns out to be the identity element of function composition. Syntactically, an empty composition can be written as an empty group { }
or an empty expression
.
Alternation (formerly called branching) is the second method of combining existing functions. It is an associative binary operator over Unarian functions that is comparable to conditional control flow (e.g. if c then a else b
) in imperative languages. Syntactically, the alternation of functions f
and g
is written as f | g
. This operator has a lower precedence than composition, so f g | h
is interpreted as the alternation of f g
and h
(written { f g } | h
), and f | g h
is interpreted as the alternation of f
and g h
(written f | { g h }
).
Evaluating an alternation on input %2
is a function that fails on odd inputs and leaves all others unchanged, then %2 + | -
maps - |
is semantically equivalent to both - | { }
and - | id
, where id
is an identity function.
Finally, since there is no way to represent them syntactically, we don't define the behavior of empty alternations (although it seems logical to define an empty alternation as a function that fails on all input, since this is the identity element of function alternation).
Bracketed groups within an expression, which are surrounded by braces and can be nested, allow for the formation of expressions that don't follow normal precedence rules. While a b | c
is interpreted as the alternation of a b
and c
, the expression a { b | c }
is interpreted as the composition of a
and b | c
.
Evaluating an expression containing a bracketed group can be done by treating the group as a reference to a new function defined by the contents of the group. Specifically, we can evaluate a { b | c }
by defining a new function b|c { b | c }
and then evaluating a b|c
. In general, for any expression containing a bracketed group { ... }
, define a new function z { ... }
and replace all instances of { ... }
(aside from the definition of z
itself) by z
. For example, if 0
is a function that maps all - 0 | + -
also maps all { - 0 | + } -
, which maps
To interpret or compile a Unarian program, an entry-point must be chosen. Some implementations may allow the user to specify a custom expression as the entry-point, but this is not required and should default to main
if unspecified. It is considered undefined behavior to have references to undefined functions or multiple definitions of the same function. However, it is recommended for implementations to treat both of these cases as compilation errors.
A compiled or interpreted program is evaluated by giving it a non-negative integer input. This input is evaluated on the entry-point expression as explained above, and the resulting output, either a non-negative integer or a failure, is returned. Input and output representations are left undefined, but it is recommended for integers to be represented in decimal and for failure to be represented by -
. Bounds on integer inputs and outputs, as well as the behavior when these bounds are exceeded, are also left undefined, but it is recommended that implementations support integers up to at least
# This is a comment.
# This is a basic function definition.
function_name { function_definition }
# Extra spacing doesn't matter.
example_func {
extra
spacing
doesn't
matter
}
# Function names can contain any characters except whitespace and '#'.
# Tokens '{', '|', and '}' are special keywords and cannot be function names.
# Tokens '+', '-', '?', '!', and '@' are built-in and cannot be redefined.
*10 { multiply_by_10 }
/10 { divide_by_10 }
^2 { square }
# Functions can call themselves recursively.
infinite_loop { infinite_loop solve_p_vs_np }
# There are two primary builtin functions: '+' and '-'.
# Applying '+' to input x returns x + 1
# Applying '-' to input x returns x - 1 if x > 0 and fails if x = 0
add_1 { + }
add_2 { + + }
add_3 { + + + }
subtract_2_or_fail { - - }
# There are three builtin functions used for debugging: '?', '!', and '@'.
# Applying '?' reads a value from standard input and returns it.
# Applying '!' to input x prints x to standard input and returns x.
# Applying '@' to input x prints the stack trace and returns x.
print_then_add_1 { ! + }
print_stack_trace { @ }
# Functions can have branching execution paths. The special token '|', called
# alternation, is used to separate alternate paths.
do_A_or_B_or_C { A | B | C }
# Empty alternates act as the identity function.
subtract_2_or_do_nothing { - - | }
# A bracketed group starts with '{' and ends with '}'. Such groups are
# evaluated as if their contents had been defined in a separate function.
complex { a { b | c } d | { e | { f } } g }
code_1 { b | c }
code_2 { f }
code_3 { e | code_2 }
less_complex { a code_1 d | code_3 g } # This is equivalent to 'complex'.
# The main function is the default entry-point for a program. It's evaluated
# when we run this program.
main { get_nth_prime }