Jordan Dehmel, jdehmel@outlook.com
github.com/jorbDehmel/oak
Oak
is a modern low-level functional programming language with
compile-time modifiable syntax. Its usecase is in low-level
language and compiler design (see the later section on
dialects). Oak
also aims to have an integrated build system,
necessitating only one call to the compiler per executable.
Oak
is most closely related to C
, but also has stronger
macro support, modern typing, generics, compile-time syntax
modification and integrated package management.
This document outlines the basics of Oak
, as well as the
canonical formatting of Oak
code. Deviation from this
formatting style is unadvisable, and considered bad form.
Oak
is, as of now, a translated language; Oak
code is
translated via the acorn
utility (see later) into C
.
acorn
can also compile Oak
into object code and link it to
create executables via clang
and clang++
, respectively.
Oak
has compile-time-modifiable syntax (see the section on
preprocessor rules), making it highly customizable and flexible
in a way that no other mainstream languages are. In Oak
, if
you think the language should have a certain shorthand, you can
create it. Oak
can similarly be a testing ground for new
language syntaxes. It supports the creation of "dialects", which
are Oak
variants which use preprocessor rules to support
independent syntactical structures.
Note: Oak
has a default syntax, called canonical Oak
.
Syntax modifications must be opted into on a file-by-file basis,
or specified by the compiler via the use of dialect files.
Name | Meaning |
---|---|
Oak |
The base Oak programming language. |
Acorn |
The Oak translator / compiler. |
Sapling |
The sub-language for creating Oak dialects/rules. |
Dialect | A syntactically-modified branch of Oak . |
TARM |
Prototype for more advanced rule system. |
This document assumes a moderate level of knowledge about
programming languages and compiler design. A thorough
understanding of C++
and C
, as well as a cursory knowledge
of Rust
, Python
or Carbon
are recommended (as these
languages use the same type specification syntax as Oak
).
Oak
is not compatible with Windows. Oak
assumes a UNIX
system.
To install, open a terminal in this folder and run
make install
. This will compile and install Oak
, as well as
the standard Oak
package. To uninstall, call acorn -A
. To
update, call acorn -a
.
Oak
versions follow the following format.
major release
.minor release
.patch
As of the time of this writing, Oak
is still on major release
0
. During this phase of development, Oak
is not stable.
This means that code written in one minor release or patch is
not guaranteed to work in another.
Once Oak
moves to major release 1
, code written in one patch
version will be guaranteed to work in another. Code written in
one minor release will most likely work in another. Code written
in one major release will have no guarantees about any other.
For examples on the concepts presented in this document, see the
./tests/
directory included in this git
repo. The files
therein are labeled CONCEPT_test.oak
, where CONCEPT
is the
concept covered within. These files are both demos and unit
tests for Oak
and acorn
. Consequently, you can compile all
demos via make test
or acorn -T
.
This section outlines things which make Oak
syntax unlike
other common syntaxes. As mentioned above, this document
requires a moderate understanding of C
/C++
syntax, and thus
we will mostly focus on differences of this.
In Oak
, types are read strictly left-to-right. This means that
an array of integers is []i32
, not i32 NAME[]
or [i32]
.
Oak
arrays always come immediately before the thing they
contain in Oak
type syntax.
In C++
, a template is verbosely defined as below.
template <class T>
void foobar(T what)
{
// ...
}
In Oak
, the template line template <class T>
is replaced by
simply <t>
(generic names must be lowercase).
let foobar<t>(what: T) -> void
{
// ...
}
Unlike C++
and Rust
, Oak
does not have implied template
instantiation calls. The template instantiator must be called
explicitly. For instance, the C++
code
// Create foobar<i32> and call it
foobar<i32>(123);
would become the Oak
code
// Ask the instantiator to create a function with this signature
foobar<i32>(_: i32);
// Call that function
foobar(123);
Oak
does not have classes (only structs). Additionally, Oak
does not have any form of member protection. This means that all
struct member variables are always public
.
In Oak
, class member functions (methods) can (and must) be
defined outside of the class itself. This means that any coder
can add a new method to a struct at any point. For instance, you
could import the string
library and add some new method to
turn a given string blue when printed.
Oak
methods are shorthand in a way similar to Python
.
let a: i32;
// This call
a.b();
// Is shorthand for
b(a);
Oak
does not have the const
keyword. Instead, everything is
const
by default, and mutability is specified by passing the
address of something. Thus, ^i32
in Oak
is equivalent to
int
in C
, and Oak
's i32
is equivalent to C
's
const int
.
In Oak
, a variable is declared as follows.
let NAME: TYPE;
For instance, a boolean named a
would be declared:
let a: bool;
You should read the :
operator as "be of type". Thus, the
above statement reads "let a be of type bool". Functions are
declared as follows.
let NAME(ARG_1: ARG_1_TYPE, ...) -> RETURN_TYPE;
For instance, a function that in C++
would be declared
bool isBiggerThanFive(int a);
would in Oak
be declared
let is_bigger_than_five(a: i32) -> bool;
. See later for atomic
type conversions from C
(IE how int
became i32
here). You
should read the ->
operator as "yield". Thus, this function
should be read "let is_bigger_than_five be a function taking
parameter a of type i32 and yielding a bool".
Additionally, the C++
code
bool isBiggerThanFive(int a)
{
return a > 5;
}
would convert to
let is_bigger_than_five(a: i32) -> bool
{
a > 5
}
Note that leaving off a semicolon is equivalent to C++
's
return
statement. Equivalently, you can use the return
keyword.
Pointers are ^
(IE a pointer to a bool is ^bool
). The "get
address" operator is @.
Syntax is fluid and user-modifiable in Oak
. For more
information about this aspect, see the Preproc Rules
section
below.
The :
, let
and ->
operators come from mathematics. For
instance, the mathematical statement
is equivalent to the code
let f(_: f64) -> f64;
Both of these statements can be read "let f be a function
mapping a single real number to another" (although f64
s are an
extremely limited representation of the real numbers).
As a final example for this section, examine the "Hello world" program below.
// Include the standard (std) Oak package, which contains print.
package!("std");
// The main function: The program will start by calling this.
let main() -> i32
{
// Print "Hello, World!" to the terminal, followed by a
// newline.
print("Hello, World!\n");
// Return a zero exit code, meaning no error.
0
}
// main.oak
// Import the `Oak` standard package
package!("std");
// Use the standard `Oak` rule set
use_rule!("std");
let main(argc: i32, argv: ^str) -> i32
{
print("This program was started with the command: ");
print(ptrarr!(argv, 0));
print("\n");
print("Hello, world!\n");
0
}
Commands (bash):
acorn main.oak -o hello.out
./hello.out
Output:
<Compiler output here>
This program was started with the command: .oak_build/a.out
Hello, world!
<Process finished with exit code 0>
let
is one of Oak
's few keywords. It represents the adding
of something to the symbol table. This could be a variable, a
function, a struct, or an enumeration. It can even be templated.
Oak
, unlike languages like Rust
and Python
, does not have
separate keywords for these actions. The possible usecases of
the let
keyword are outlined below.
// Register `fn` as a function mapping nothing to nothing
let fn() -> void
{
}
// Register `var` as an instance of i32
let var: i32;
// Register `structure` as a struct
let structure: struct
{
,
}
// Register `enumeration` as an enum
let enumeration: enum
{
,
}
// Register `templated_fn` as a templated function
let templated_fn<t>() -> void
{
}
// Register `templated_structure` as a templated struct
let templated_structure<t>: struct
{
,
}
// Register `templated_enumeration` as a templated enum
let templated_enumeration<t>: enum
{
,
}
Note: Variables represent instances of a given type.
Instances, by nature, cannot be templated. Thus, there is no
templated case for the variable declaration syntax of let
. All
other variants of let
are considered abstract (they only tell
the compiler how to add something), but variable declaration is
considered concrete (it uses abstract information to create a
non-abstract instance).
Pointers are variables which hold a memory address. This address
could refer to a single object in memory, or the start of a
contiguous block of like-typed objects. To get the address of a
variable in Oak
, use the @
symbol. To get the value that a
pointer references, use the ^
symbol. Pointer types are the
^
symbol, followed by the type they point to. You can have
pointers to pointers.
The alloc!
macro for allocating new memory on the heap returns
a pointer to the memory it allocated, should it be successful.
Similarly, the free!
macro frees the memory of the pointer it
is passed.
Mutability (ability to change data) is determined by referencing. If an object is passed into a function as a parameter, it will only be modifiable if the function signature designates it a pointer. When calling a function, a single reference can be automatically added by the compiler, but no more than that. For example,
let fn_0(what: i32) -> void
{
;
}
let fn_1(what: ^i32) -> void
{
;
}
let fn_2(what: ^^i32) -> void
{
;
}
let main() -> i32
{
let arg: i32;
// Legal call
fn_0(arg);
// Also legal
fn_1(arg);
// NOT legal- will cause compiler error
fn_2(arg);
0
}
The address-of operator @ should be avoided unless strictly necessary.
A pointer to an object can be used just like the object itself with respect to its members. For instance, all the following are legal calls.
let item: struct
{
a: i32,
}
let main() -> i32
{
let obj_1: item;
let obj_2: ^item;
let obj_3: ^^item;
let obj_4: ^^^item;
// All the following are legal
obj_1.a;
obj_2.a;
obj_3.a;
obj_4.a;
0
}
Oak
has a unique syntax for arrays that may be unfamiliar to
users of C
, C++
and Rust
. In Oak
, arrays prefix the type
they refer to. For instance, an array of integers is []i32
instead of i32[]
. Oak
has two types of arrays: Sized and
unsized. Unsized arrays are syntactically equivalent to pointers
internally, and can point to an array of any size. These are the
only kind of array which can be passed as parameters, and the
kind that are assigned by alloc!
. They are denoted with the
[]
operator. Sized arrays are not pointers- They are the
actual arrays. These are denoted with the [n]
operator, where
n
is a compile-time constant integer. This means that an
integer variable cannot be used as the size of an array.
Oak
types are always read left-to-right. Arrays are not
an exception (unlike in other languages). This makes it slightly
easier to decode exactly what an Oak
array is of.
C
:
i32 a[12];
i32 *b;
i32 *c[10];
i32 d[10][20];
Oak
:
let a: [12]i32;
let b: []i32;
let c: [10]^i32;
let d: [20][10]i32;
In Oak
, a pointer to a sized array is an illegal type.
However, a pointer to an unsized array is fine.
// Illegal!
let a: ^[5]i32;
// Fine
let b: ^[]i32;
You can also have arrays of function pointers as shown below.
// Array of function pointers
let l: []()->void;
Here are some more miscellaneous examples.
// Array of i32
let i: []i32;
// Array of i32 pointers
let j: []^i32;
// Pointer to array of i32 pointers
let k: ^[]^i32;
// Sized array of i32
let l: [5]i32;
// Sized array of i32 pointers
let m: [10]^i32;
// Pointer to a pointer to an array of pointers to arrays of
// pointers to pointers to arrays of arrays of pointers to i32s
let absurd: ^^[]^[]^^[][]^i32;
// Array of 10 of the above
let absurd_arr: [10]type!(absurd);
Oak
cannot have uninitialized variables. Thus, arrays (sized
and unsized) have default values.
Unsized arrays, since they are just pointers to contiguous
arrays in memory somewhere, are initialized to the null pointer
0
. This is how all pointers in Oak
are initialized.
Sized arrays are not pointers- They are the actual contiguous memory positions which unsized arrays point to. Their memory is initialized to all zeros.
This section lists guidelines / rules for writing Oak
code.
Deviation from the following is considered non-canonical
Oak
formatting, and is unadvised. Deviation, in fact, could
easily be considered an error by the compiler.
Code scopes go like this:
// Good
if (true)
{
// ..
}
let fn() -> void
{
// ..
}
NEVER like this:
// Bad
if (true) {
// ..
}
let fn() -> void {
// ..
}
Ensure you always have some sort of return statement (except
-> void
functions), even when it is unreachable. Principles
take precedent over literal compiler interpretation.
Oak
exclusively uses underscored variable names. Camelcase
variable names should never be used, and capitalized variables
are reserved for special cases.
// Good
let long_variable_name_with_multiple_words: i32;
// Bad
let longVariableNameWithMultipleWords: i32;
let longvariablenamewithmultiplewords: i32;
// Only allowed for methods which use dynamic allocation
// or for operator method aliases
let LongVariableNameWithMultipleWords: i32;
// Why? No?!?
let Long_variable_name_with_multiple_words: i32;
The same naming tips go for structs and enums. Structs and enums should not have any special name conventions. The only symbols which have different naming conventions are the operator aliases (see later), which use upper camelcase.
These naming conventions are also extended to package names
(unless for some reason incompatible with the git host website
they are on) and file names. Unlike standard practice in C++
and most other languages, generics types (see later) should
still be lower-underscore-case.
Single-line comments should start with "//
" (slash, slash,
space), and multi-line comments should start with /*
(slash-star-newline) and end with */
(newline-star-slash)
(just like C++
).
// Correct single line comment
//Do not format them like this (no leading space)
// single-line comments should NEVER start with a lower-case
// Except in this case; When
// it is a continuation of
// a previous line for
// formatting reasons, a
// lowercase start is fine.
/*
Correct multi-line comment.
The same wrap-over capitalization
rules apply.
*/
/*
incorrect multi-line comment
(no leading capitalization)
*/
/*Don't do this (why would you do this?)*/
/*Also do not
do this*/
Also, all files should always end with a newline.
Comment blocks before global definitions (those not within a
code scope) will be included in automatic manual generation via
acorn -m
. These manuals use standard markdown, and by
extension all comments should as well.
Any sequence starting with #
is ignored completely. This is to
allow for shebangs. This symbol should never be used for
commenting.
Tabbing should follow standard C++
rules. Tabs and
quadruple-spaces are both acceptable, but double-spaces are not.
Not using tabbing is not acceptable.
No line of Oak
should be longer than 96 standard-width chars
wide. Since newlines are erased by the lexer, one can be
inserted at any point without disturbing syntax.
If possible you should use a spellchecker when writing code. If a variable name does not pass a spell checker, it is not considered a valid name and should be changed.
Global variables are illegal. Anything achieved by globals can be done better by passing arguments.
All Oak
should be as minimalist as possible, and should be
split into as many files as it makes sense to do.
Variable, package, and file names may be abbreviated within the
sake of reason. An abbreviation is disallowed if it conflicts
with another name or if the original name cannot be reasonably
extracted from it (for instance, the abbreviation regex
is
fine, but re
is not, because re
could realistically refer to
anything).
Interfacial files should take the form NAME_inter.oak
and
NAME_inter.other_language_suffix_here
(see later for more
information on formatting Oak
interface files). For instance,
sdl_inter.oak
and sdl_inter.cpp
.
If multiple lines are required within a function call, the following lines should be tabbed exactly one further than the beginning line. Whenever possible, you should include exactly one argument per line in such a case.
// Proper formatting:
let long_function_name(long_argument_name_one: i32,
long_argument_name_two: i32,
long_argument_name_three: i32,
long_argument_name_four: i32) -> i32
{
0
}
// Also fine:
let long_function_name(
long_argument_name_one: i32,
long_argument_name_two: i32,
long_argument_name_three: i32,
long_argument_name_four: i32
) -> i32
{
0
}
Oak
code can technically be written using unicode encoding,
but ASCII is recommended for any programmer whose spoken
language can be expressed within it.
Note: Oak
is hypothetically especially translatable to
other human languages due to the preprocessor rule system.
It is trivial to implement a dialect file which replaces all
Oak
keywords with any spoken language's equivalent versions,
and still fairly easy to translate the std
package. For
example, an Oak
program written in Hindi would experience a
translation process as follows:
oak_hindi_dialect.oak -> oak_canonical_dialect.oak
-> c_version.cpp -> executable.exe
Unicode works perfectly fine in Oak
, although the later was
not designed with the former in mind. International coders
should have no troubles.
If Oak
reports unicode characters as illegal, please report it
as a bug.
The main function must take one of the following forms (you can
change the argument names). argc
is the number of command line
arguments, at minimum 1. argv
is a pointer to an array of
strings, each of which is a command line argument. The first
item of this, argv[0]
, is the name of the executable that was
run.
let main() -> i32;
let main(argc: i32, argv: []str) -> i32;
let main(argc: i32, argv: [][]i8) -> i32;
All macros will use this third form
((argc: i32, argv: [][]i8) -> i32
). Note that these two
variables can be named anything. A common form is
(c: i32, v: [][]i8) -> i32
.
Oak
does not have classes, nor does it have internally defined
methods. Methods are converted into equivalent function calls
during translation as follows.
OBJ.METHOD_NAME(ARG, ARG, ...);
Becomes
METHOD_NAME(@OBJ, ARG, ARG, ...);
Thus, you can define a method as follows.
let OBJ_TYPE.METHOD_NAME(...) -> RETURN_TYPE;
For instance, if you wanted to define a method that converts an integer to a double, its signature would be:
let i32.to_double(self: ^i32) -> f64;
This frees the programmer from the bounds of the initial class definition.
In Oak
, you can define new data structures as structs, and
define any methods upon it later. For instance, a linked list
could be broadly defined as follows.
// Alloc is defined in std's std_mem.oak
package!("std");
let linked_list<t>: struct
{
data: t,
next: ^linked_list<t>,
}
let append(self: ^linked_list<t>, what: t) -> void
{
// Alloc is equivalent to `C++`'s new keyword
alloc!(next);
next.data = what;
}
Oak
does not have private members, nor does it have
inheritance.
Oak
does not have explicit header files like C
/ C++
, but
there is no reason why you could not use a .oak
file like a
.hpp / .h
. For example, a .oak
can establish function
signatures without explicitly defining them, allowing another
.oak
to define them. This allows easy division of labor, as in
C / C++
. This is obviously vital for any project of scale.
Additionally, the translator will detect and prevent circular
dependencies, eliminating any analogy to pragma once
.
The alloc!
and free!
functions are akin to C++
's new
and
delete
keywords, respectively. Alloc requests a memory
position on the heap with the size of t
, and free
correspondingly releases that memory. Oak
is not memory safe,
in that it does not automatically clean up alloc!
-ed memory
like Java
or Rust
. In this way it is more similar to C++
or C
.
Example of alloc!
and free!
:
let New<t>(self: ^node<t>) -> void
{
// Allocate a single instance on the heap
alloc!(self.data);
free!(self.data);
// Allocate an array of size 5 on the heap
alloc!(self.data, 5);
// Free a dynamically allocated array
free!(self.data);
}
let Del<t>(self: ^node<t>) -> void
{
free!(self.data);
}
let main() -> i32
{
let i: ^bool;
alloc!(i);
}
If a number is provided as a second argument to alloc!
, it
returns an unsized array of the newly allocated values.
Otherwise, it returns a pointer to the single newly
allocated space.
You can declare only the function signatures in Oak
and define
them in C
, as is done in the Oak
std
(standard) package.
This is called interfacing. These pairs of dual-language
files are interfacial files, and any package primarily
porting one language's features to Oak
is an
interfacial package.
Note: You will have to follow the Oak
mangling process, as
detailed below.
The Oak
function signatures
let foo_bar(self: ^data_structure, what: data, hello: ^^data) -> data_structure;
let hello_world() -> void;
let fn_ptr_thing(fn: ^() -> void) -> void;
Will, upon translation to C
, become
// Note how the new "mangled" function name contains the entire type
// Argument types are separated by JOIN, and the argument section ends
// with MAPS, followed by the return type.
struct data_structure foo_bar_FN_PTR_data_structure_JOIN_data_JOIN_PTR_PTR_data_MAPS_data_structure(struct data_structure *self, struct data what, struct data **hello);
void hello_world_FN_MAPS_void(void);
// This is how function pointers get mangled
void fn_ptr_thing_FN_PTR_FN_MAPS_void_MAPS_void(void (*fn)(void));
Thus, in an interfacial file, you should follow the
above pattern. For higher-level languages like C++
and Rust
,
you will have to manually disable the mangler upon compile time
in order for interfacing to be successful. For C++
, this
comes in the form of the extern "C"
statement.
Where in C++
you would write
class example
{
public:
void operator=(..);
bool operator==(..);
// .. etc
};
In Oak
you would write
// The unit struct; No members
let example: struct
{
,
}
let Copy(self: ^example, ..) -> void;
let Eq(self: ^example, ..) -> bool;
There are many so-called "operator aliases" which are listed
below. If self
is not a pointer, it is a const function.
Oak |
C++ |
Description | Signature for T |
---|---|---|---|
Less | < | Less than | let Less(self: T, other: T) -> bool; |
Great | > | Greater than | let Great(self: T, other: T) -> bool; |
Leq | <= | Less that or equal | let Leq(self: T, other: T) -> bool; |
Greq | >= | Greater than or equal | let Greq(self: T, other: T) -> bool; |
Eq | == | Equal to | let Eq(self: T, other: T) -> bool; |
Neq | != | Not equal to | let Neq(self: T, other: T) -> bool; |
And | & | Bitwise and | let And(self: T, other: T) -> T; |
Andd | && | Boolean and | let Andd(self: T, other: T) -> bool; |
Or | | | Bitwise or | let Or(self: T, other: T) -> T; |
Orr | || | Boolean or | let Orr(self: T, other: T) -> bool; |
Not | ! | Boolean negation | let Not(self: T) -> T; |
Copy | = | Copy constructor | let Copy(self: ^T, other: T) -> void; |
Del | ~ | Deletion | let Del(self: ^T) -> void; |
Add | + | Addition | let Add(self: T, other: T) -> T; |
Sub | - | Subtraction | let Sub(self: T, other: T) -> T; |
Mult | * | Multiplication | let Mult(self: T, other: T) -> T; |
Div | / | Division | let Div(self: T, other: T) -> T; |
Mod | % | Modulo / remainder | let Mod(self: T, other: T) -> T; |
AddEq | += | Increment by a number | let AddEq(self: ^T, other: T) -> T; |
SubEq | -= | Decrement by a number | let SubEq(self: ^T, other: T) -> T; |
MultEq | *= | Multiply and assign | let MultEq(self: ^T, other: T) -> T; |
DivEq | /= | Divide and assign | let DivEq(self: ^T, other: T) -> T; |
ModEq | %= | Modulo and assign | let ModEq(self: ^T, other: T) -> T; |
AndEq | &= | Bitwise AND and assign | let AndEq(self: ^T, other: T) -> T; |
OrEq | |= | Bitwise OR and assign | let OrEq(self: ^T, other: T) -> T; |
Lbs | << | Left bitshift | let Lbs(self: T, other: T) -> T; |
Rbs | >> | Right bitshift | let Rbs(self: T, other: T) -> T; |
New | TYPE_NAME | Instantiation | let New(self: ^T) -> ^T; |
Del | N/A | Deletion | let Del(self: ^T) -> void; |
There is no Oak
equivalent to C++
's increment and decrement
operators.
It is notable that there is not a set return type for many of
these. It is common to see copy
return T
, ^T
, or void
.
With a few exceptions, operator alias replacement occurs within parenthesis (either as standalone blocks for evaluation or as part of a function call). However, in cases like assignment parenthesis are inferred.
No std
Oak
files use C++
-style streams, but there is
nothing in the language stopping them from working. However, the
obfuscation they add to programmers only familiar with modern
languages most likely outweighs any benefit they might provide.
C++ |
Oak |
---|---|
unsigned char | u8 |
char | i8 |
unsigned short | u16 |
short | i16 |
unsigned int | u32 |
int | i32 |
unsigned long | u64 |
long | i64 |
unsigned long long | u128 |
long long | i128 |
float | f32 |
double | f64 |
char * | str |
bool | bool |
T& | ^t |
T[] | []t |
T[123] | [123]t |
T* | ^t |
const T | t |
template<class T> | <t> |
The last few entries show two things; That Oak
does not have
references (they are replaced with pointers), and that Oak
has
smart templating. Additionally note that there are no multi-word
types (IE unsigned long long int) in Oak
, and that pointers
and arrays are moved to before the type they point to.
For instance,
template <class T, class F>
unsigned long long int **doThing(T &a, T b[]);
Becomes
let do_thing<t, f>(a: ^t, b: []f) -> ^^u128;
Oak
has built-in support for hexadecimal, decimal, and binary
integer literals. Hex numbers are prefixed by 0x
, binary by
0b
, and decimal by nothing. Additionally, the Oak
lexer will
merge successive integer literals together, so spacing can be
inserted anywhere within an integer literal. This is extremely
necessary for readability.
The following code segment demonstrates these ideas.
package!("std");
let main() -> i32
{
let j: u32;
// Binary
j = 0b10101010101010101010101010101010;
// Hex
j = 0x01234567;
// Separation: All of the following are valid.
j = 123 456;
j = 0b 0000 1111 0101 1010 0000 1111 0000 1111;
j = 0b11111111111111111111111111111111;
j = 0x 01 23 45 67;
printf!("j: %\n\n", j);
0
}
Similarly, strings are automatically concatenated over spaces or lines.
let long_string! = "hello! This is a very long string which it"
"is necessary to split over several lines.";
let split_string! = "we can also" "do this";
Note: Integer literals cannot be split across multiple lines.
You can declare a generic function (a function which can operate
on a generic type, rather than a specific, defined type) by
using the <..>
notation as below.
let generic_fn_demo<t>(arg1: t, arg2: bool, arg3: *t) -> t;
This allows a generic type t
(you can use whatever name you
want) to enter the function's scope temporarily. On a compiler
level, generic functions do not exist until they are called.
Upon compiler detection of a call, it is ensured that an
appropriately-typed function exists (otherwise, such a function
is instantiated).
Oak
does not have automatic instantiation of generic functions
via argument type analysis. You must manually call the template
instantiation unit as follows.
let gen_fn<t>(what: t) -> void
{
print(t);
}
let main() -> i32
{
// Call the template instantiator
// Request the instantiation of the generic block
// identified by the signature `gen_fn<i32>(what: i32)`
// which is our gen_fn for the i32 type
gen_fn<i32>(what: i32);
gen_fn(123);
0
}
Note: It is common to replace the names of any and all
argument in the instantiator call with the null-name token _
.
This is equivalent to writing anything else there, but signals
to the user that it is an instantiation call.
Similarly, you can define generic structs as follows.
let node<t>: struct
{
data: t,
next: ^node<t>,
}
let main() -> i32
{
// Generic instantiation call
node<i32>;
let head: node<i32>;
}
Generic enumerations are defined exactly like structs (see later for more information on enumerations). See later in this document for more about instantiating generic structs.
Acorn is the Oak
translator, compiler, and linker. Oak
is
first translated into C
, which is then compiled and linked.
Acorn command line arguments:
Option | Verbose | Purpose |
---|---|---|
-a | Update acorn | |
-A | Uninstall acorn | |
-c | --compile | Produce object files |
-d | --debug | Toggle debug mode |
-D | --dialect | Uses a dialect file |
-e | --clean | Toggle erasure (default off) |
-E | --execute | Run executable when done |
-g | --exe_debug | Use LLVM debug flag |
-h | --help | Show this |
-i | --install | Install a package |
-l | --link | Produce executables |
-m | --manual | Produce a .md doc |
-M | Used for macros | |
-n | --no_save | Produce nothing |
-o | --output | Set the output file |
-O | --optimize | Use LLVM optimization O3 |
-q | --quit | Quit immediately |
-Q | --query | Query an installed package |
-r | --reinstall | Reinstall a package |
-R | --remove | Uninstalls a package |
-s | --size | Show Oak disk usage |
-S | --install | Install a package |
-t | --translate | Produce C++ files |
-T | --test | Compile and run tests/*.oak |
-u | --dump | Save dump files |
-U | Save rule log files | |
-v | --version | Show version |
-w | --new | Create a new package |
-x | --syntax | Ignore syntax errors |
Oak
has some limited support for compiler optimization and
runtime debugging due to its use of LLVM-IR via Clang. The
acorn -O
and acorn -g
commands tell Clang to use full
compile-time optimization and allow runtime debugging
respectively.
Due to Oak
's mangler and rule system, runtime debugging
information may not be immediately useful. For instance,
generic struct instances have compilation-time-mangled names:
vec<i32>
will become vec_GEN_i32_ENDGEN
, which is much less
intuitive to a human observer. Additionally, compile-time rules
may cause strange effects in the output code which may not be
present in the source. Enumerations, due to their complex
internal structure, also may be hard to parse to a human eye.
However, with practice and use of the acorn -e
flag (to view
the translated c++
source code) this information should become
easier to parse.
Additionally, it is worth noting that both of these options will
increase compilation time, and the -g
option will
significantly increase the size of the output executable.
The following are the standard macros associated with Acorn.
include!(WHAT, WHAT, ..)
causes the translator to also use
the file(s) within. It can take .oak
files. It will first look
for local files. However, if no local file exists, it will check
the /usr/include/oak
folder. For instance, if include!
is
called on std/opt.oak
, acorn
will first look for a local
./std/opt.oak
file. If none exists, it will check
/usr/include/oak/std/opt.oak
(which is part of the standard
Oak
package).
package!(WHAT, WHAT, ..)
is a more advanced version of the
above. See the packages section below for more details.
link!(WHAT, WHAT, ..)
causes the compiler to also use the
file(s) within. It takes .o
files, or anything that can be
used by a standard C
linker. You can, for example, write a
function header with no body in a .oak
file, and add a link!
macro containing the function definition, perhaps compiled in
another language. This makes integration with C / C++
files
extremely easy.
You can also define your own macros as follows. If you wanted to
create a macro named hi!
which expands to
print("hello world")
, you would do the following.
Macro arguments are always treated as strings. For instance,
link!(foobar)
is equivalent to link!("foobar")
. You are
encouraged to put quotations around any filenames.
let hi!(argc: i32, argv: ^^i8) -> i32
{
include!("std/std_io.oak");
print("print(\"hello world\")");
0
}
A macro is compiled separately, so if you want to use a included file within, you will have to call its inclusion macro again. You will note that a macro has the same type as a main function; This is why.
If you wanted to create a macro named print_five_times!()
which prints the name of a symbol 5 types, you would do the
following.
let print_five_times!(argc: i32, argv: ^str)
{
if (argc != 2)
{
throw(error("Invalid number of arguments."));
1
}
print(argv[1]);
print(argv[1]);
print(argv[1]);
print(argv[1]);
print(argv[1]);
0
}
If you were to call print_five_times!(a)
, it would expand into
aaaaa
. If you were to call print_five_times!(a[b].c)
, you
would receive a[b].ca[b].ca[b].ca[b].ca[b].c
.
The package!(WHAT)
macro imports a package. If it is not yet
installed on the compiling device, it can be cloned via Git by
using acorn --install PACKAGE_URL
. You can update or reinstall
a package via acorn --reinstall PACKAGE_URL
. A few packages
have known URLs, meaning you can just say
acorn --install NAME
, rather than the full URL. Installed
packages keep their files in /usr/include/oak/NAME
. Note: You
can use --install
or -S
.
Every package must have a file named oak_package_info.txt
.
This file takes the following form.
NAME = "name_goes_here"
VERSION = "0.0.1"
LICENSE = "license"
DATE = "July 14th, 2023"
SOURCE = "example.com/example_package"
AUTHOR = "Jordan Dehmel, Joe Shmoe"
EMAIL = "jdehmel@outlook.com, example@example.com"
ABOUT = "A demo of Oak package creation"
INCLUDE = "main_file_to_link_to.oak,another_file.oak"
SYS_DEPS = "package1 package2"
NAME
is the name of the package. VERSION
is the version
number of the package (For instance, major release 1, minor
release 15, patch 12 would be 1.15.12
). DATE
is the date
this version was last updated. SOURCE
is the URL where the
package's Git repo can be found. AUTHOR
is the string which
contains all the package author names, separated by commas.
EMAIL
is the same, but for emails. ABOUT
is the package
description.
INCLUDE
is the most important field- It contains the filepaths
(relative to the main folder of the repo, IE ./
) of all files
which will be directly include!()
-ed by Acorn at compile-time.
For instance, the STD
(standard) library has a central linking
file named std.oak
. This file contains include!
macros for
all other files in the package. Thus, only std.oak
needs to
linked at compile time- the rest will be taken care of
automatically. Thus, std/oak_package_info.txt
has a line
reading INCLUDE = "std.oak"
.
SYS_DEPS
is a string containing space-separated packages
required to be installed by your operating system. For instance,
if a package used SDL2 and Git, it would include the line
SYS_DEPS = "SDL2 git"
.
Packages are only managed by Git
. The canonical host
website is currently github
.
If you use VSCode, you can add this to your
.vscode/settings.json
file.
"files.associations": {
"*.oak": "rust",
}
Rust
is similar enough to Oak
that you won't notice any
major issues, so long as you don't enable any Rust
extensions.
There are several so-called preprocessor definitions which can be called upon. These will be replaced with their values before compilation. Each of these begin and end with two underscores.
Name | Type | Description |
---|---|---|
line! | i128 | The current line |
comp_time! | i128 | The UNIX time of compilation |
prev_file! | str | The path of the previous Oak file |
file! | str | The path of the current Oak file |
sys! | str | The operating system compiled for |
Additional preprocessor definitions can be defined as follows.
let name! = definition;
Following the above definition, all occurrences of name!
will
be replaced by definition
.
Use with caution. Rules alter the fundamental syntax of the language.
The following subsection describes the use of the Sapling
rule
engine. Additional rule engines are available via minor changes
to the acorn
source code.
Rules are an experimental feature that allow you to add more
functionality to Oak
. They take in a pattern of symbols, and
replace it with a different one when it is detected in the code.
A pattern definition is mostly like a fragment of Oak
code,
save for a few special features.
The default symbol type in a rule is a literal. These match only
the symbol that they are. For instance, hi
would only match
symbols that say hi
. It would not match a symbol that said
hi-llo
, even though the pattern is a prefix of the latter.
This is because rules operate on symbol-wise pattern matching,
not character-wise pattern matching.
The second type of symbol in a rule is a variable. These take
the form of a dollar sign followed by a single alphanumeric
character (in regex, \$[a-zA-Z0-9]
). Variables can only use
single-character names, so there are a maximum of 62
variables
per rule. Whatever single symbol is matched into a variable will
be accessible in the output pattern.
The final type of symbol in a rule is a wildcard, represented by two dollar signs. A wildcard matches anything, but is not stored.
The dollar sign was chosen over the traditional "glob" *
operator because it is not used within usual Oak
code. The
glob also represents multiplication, so it is not used. Using
the dollar sign means that there are no backslashes needed in an
Oak
rule.
Output patterns do not use wildcards. Upon replacement, any occurrence of a defined variable in the output pattern will be replaced with that variable's match in the input pattern.
For instance, the input rule meow $$ $m meow
would match
meow woof bark meow
, or indeed any other two symbols
surrounded by meow
's. The difference comes only in the output
pattern; $s
would map to bark
, but woof
would never be
accessible. As far as the output pattern is concerned, that $$
could have matched anything.
Input and output patterns are lexed, so spaces may or may not be required within.
In a way, rules allow you to define whatever syntactic fluff you
want. The rule-definition language outlined here is tentatively
called Sapling
. This may be stylized $apling
.
Rules are defined using the new_rule!
macro. This takes three
arguments; The name of the rule, the input pattern, and the
output pattern. For instance, a rule named foo_to_bar
with an
input pattern foo
and an output pattern bar
would be
declared new_rule!("foo_to_bar", "foo", "bar");
. Note that all
arguments to rule macros must be strings. This is advisable
(but not strictly required) of all macros.
Once a rule is defined, it will be inactive. To make a rule
active, you must call the use_rule!
macro. This takes one or
more rule names, and activates those rules from that point on.
Note that these rules are only active after this macro is
called.
The set of active rules does not roll over when a new file is
called, but the set of all rules does. This means that a
included file will have access to its "parent" files' rules,
but they will always be off by default. The use_rule!
macro
will need to be called within every file which wants to use a
rule.
A rule can be deactivated (but never deleted) using the
rem_rule!
macro. This takes zero or more arguments. If zero
arguments are given, it disables all rules. If one or more, it
disables the rules with the given names.
Multiple rules can be bundled together under one name by using
the bundle_rule!
macro. This takes two or more arguments, with
the first argument being the name of the bundle. The remaining
arguments are the names of the rules which fall within this
bundle. This allows you to use a single use_rule!
call to
activate a hole suite of rules.
Bundles cannot be disabled all at once; You will need to call
rem_rule!
for each of their component rules. A bundle with a
given name will always overwrite a rule with the same name,
except in a bundle_rule!
call. A rule can never overwrite a
macro call, but can overwrite anything else. Rules cannot find
special / comment symbols (like the secret //__LINE__=
symbol).
// File "source.oak"
package!("std");
new_rule!("everything_to_meow", "$$", "meow");
new_rule!("remove_meow", "meow", "");
bundle_rule!(
"remove_everything",
"everything_to_meow",
"remove_meow"
);
use_rule!("remove_everything");
these symbols will be destroyed by the meow menace
Any file which calls include!("source.oak");
would have access
to the remove_everything
rule, but it would always be off by
default.
Here is the source code of the std_method
bundle, which does
automatic method reformatting.
new_rule!("argless_mut_method", "$a . . $b ( )", "$b ( @ $a )");
new_rule!("argless_method", "$a . $b ( )", "$b ( $a )");
new_rule!("mut_method", "$a . . $b (", "$b ( @ $a ,");
new_rule!("const_method", "$a . $b (", "$b ( $a ,");
bundle_rule!("std_method",
"argless_mut_method",
"argless_method",
"mut_method",
"const_method");
As of version 0.0.2
, Sapling
also has several more features.
On such feature is $+
, which is a wildcard that matches any
one or more symbols. It may not be the first or last symbol in a
Sapling
sequence, as it is terminated only by the first
instance of the following symbol (if it were the first item, it
would be highly inefficient). Another such, with the same rules,
is $*
. This symbol matches any zero or more symbols that are
not the following symbol. With these two symbols, no occurrence
of the following symbol may match anywhere within. Additionally,
a variable may not occur directly after either.
hi $* ( )
would match hi hello hello hello hello ( )
, but
not hi hello hello ( hello ) ( )
.
This update also introduces a stored glob operator $*X
with
X
being another single-character variable name. For instance,
if $*a {
-> if ( $a ) {
would auto-parenthesize if
statements.
// STD Oak dialect rules
clear
"if $*a {" "if ( $a ) {"
"while $*a {" "while ( $a ) {"
// NOT final; We want to use STD_OD as a jumping-off point
It also introduces the merge operator $<
for output patterns.
This fuses the preceding and proceeding symbols, without any
separation. For instance, ( $t )( $*f )
-> to $< $t ( $f )
would convert standard cast notation into the Oak
equivalent.
During the matching of a rule via Sapling
, the engine will
keep track of the history of the attempt thus far. This history
is called the rule memory. You can reset the memory with the
$~
card, or pipe the entire contents into a variable with the
$>X
card (where X
is the variable).
Consider the following rule, taken from an old version of the
std
Oak
package.
new_rule!(
"for",
"for ( $*a ; $*b ; $*c ) { $*d }",
"{ $a ; while ( $b ) { $d $c ; } }"
);
A modified version of this still underlies the Oak
for
loop
system. However, what happens if we embed an if
statement
within a for loop in the above example? The rule would stop
tracking $d
as soon as it encountered the ending curly brace
of the if
statement, possibly breaking all following code. We
need a card which can handle pairs of opening and closing
symbols. $<$OPEN$CLOSE$>
is just such a card. If we replace
OPEN
with an opening curly brace and CLOSE
with a closing
one, this card will recognize any one or more nested set of
curly braces and move beyond them (more specifically, it will
maintain an internal counter, incrementing it upon OPEN
,
decrementing it upon CLOSE
and only ceasing once it reaches
zero). This can be use in conjunction with rule memory to store
the entire set of nesting braces in a variable.
Below is the modern Oak
equivalent of this rule.
new_rule!(
"for",
"for ( $*a ; $*b ; $*c ) $~ $<${$}$> $>g",
"$a ; while ( $b ) { $g $c ; }"
};
This version can recognize infinitely nested sets of curly brackets with no issues.
Note: The Sapling
interpreter is somewhere between a
deterministic finite automata and a push-down automata.
A suite is a set of multiple literals, any of which can be matched in a given position. A suite is denoted as follows.
$[$each$of$these$would$match$]
Similarly, a negated suite matches any single symbol except those within.
$/[$none$of$these$would$match$]
For both of these, literals are delineated by the dollar sign.
A negative lookbehind, denoted by
$/<$literal
causes a pattern fail if the previous character was the specified literal. It does not advance the current position, so
$/<hi $/<hey hello
Would match any occurrence of hello not prefaced by hi or hey.
Similarly, there is the negative lookahead, denoted by
$/>$literal
This works identically to the negative lookbehind, except with the following symbol instead of the preceding one.
Note: There is no positive lookahead or lookbehind in Sapling.
Dialect files introduce rules that are handled before all others
and which are always enabled by the translator. This allows for
the creation of Oak
dialects, or syntactically different
languages which use rules to collapse into regular Oak
at
translation-time.
Dialect files (.od
files, for Oak Dialect) consist of a
few types of line. There can be comments, marked by //
(slash-slash-space). A line can read clear
, which erases all
existing dialect rules. It could also read final
, which
disables any further dialect changes. Finally, a line could
read "INPUT PATTERN HERE" "OUTPUT PATTERN HERE"
, with an input
and an output pattern enclosed by quotes and separated by a
space.
Dialects are loaded via the acorn -D filename.od
argument.
Theoretically, Oak
's ability to construct and use dialects
means that it can be used to process other languages into
itself. A properly-constructed dialect file could instruct
acorn
to turn a Rust
file into Oak
and compile it. This is
obviously a long way off, however, and will require a more
advanced language than Sapling
.
The Oak
translator goes through the following passes when
processing a file from Oak
to C
. Each of these represents
(at minimum) one full pass over the entire file. In stages after
stage 3, they represent symbol-wise iteration, but before then
they represent character-wise iteration.
1 - Syntax checking
2 - Lexicographical symbol parsing (lexing)
3 - Compiler macros (file inclusions, linker commands, etc)
4 - Rule parsing and substitution (handles rule macros)
5 - Macro definition scanning
6 - Regular macro call handling
7 - Preprocessor definition insertion
8 - Parenthesis and operator substitution
9 - Sequencing
10 - (External) Object file creation via clang
11 - (External) Executable linking via clang
Special symbols are inserted during lexing to retain metadata.
They are essentially ways for the lexer to talk to the compiler.
They are ignored by nearly every other stage of translation.
This is, for instance, how acorn
knows which line an error
occurs on even though the Oak
lexer erases newlines extremely
early in processing. Special symbols are prefixed with two
slashes, like a comment. The line
special symbol looks like
this: //__LINE__=X
, with X
being replaced by the current
line.
As a front-end user of acorn
, you shouldn't have to worry
about special symbols. They will appear in some parts of
acorn
-generated dump files, but otherwise they should not
appear anywhere (even Sapling
ignores them).
Oak
translation, in theory, runs in time complexity
proportional to the sum number of characters in the included
files; O(n)
. This analysis assumes that this number of
characters outnumbers the number of symbols, and that any
further iterative process takes fewer steps that this. The space
complexity is the same as the time complexity. These
prepositions may prove to be false, however, warranting a more
thorough analysis.
The std
rule (included in the "std" package) allows Oak
to
behave more like other languages. For instance, it allows the
use of OBJ.FN()
-style method calls and OBJ[IND]
-style array
accesses. More detail can be found by examining std/std.oak
or
std/std.od
.
The "std/math.oak"
file provides some math functions, and is
analogous to C++
's cmath
. Below is a table describing the
included functions.
Name | Description | Notes |
---|---|---|
abs |
Returns the absolute value of a number | |
pow |
Returns a to the power of b |
Runs in O(b) |
math_pi |
Returns pi (3.14159..) |
|
math_e |
Returns e (2.71828..) |
|
fact |
Returns the factorial of a number | |
sin |
Returns sin of a value (radians) |
Uses a 9th order Taylor polynomial |
cos |
Returns cos of a value (radians) |
Uses sin |
tan |
Returns tan of a value (radians) |
Uses sin and cos |
floor |
Rounds a number down | |
ceiling |
Rounds a number up | |
round |
Rounds a number | |
sqrt |
Returns the square root of a number | Uses Newtonian iteration |
f_mod |
Returns a modulo b (floats) |
|
min |
Returns the smaller of a and b |
|
max |
Returns the larger of a and b |
|
even |
Returns true if a number is even |
Oak
provides some basic multithreading support as of version
0.0.4. This comes via the "std/threads.oak"
file. A thread
begins a new thread when you call start(@t, @f)
, where t
is
the thread
and f
is a function mapping void to void
(let f() -> void
). After a thread
has begun, you must wait
for it to complete. To halt the execution of the instantiating
thread until the instantiated thread has finished, call
join(@t)
for some t: thread
. See later for more help on
function pointers.
Simple DirectMedia Layer 2, or SDL2 is a framework for
accessing low-level system IO. The Oak
git repo includes a
basic Oak
package called sdl
which interfaces with the
C SDL2
library. This allows some graphics to be created by
Oak
.
Unlike the C
family of languages, Oak
has a built-in build
system. No makefiles
or command-line flags are necessary (or,
for the latter, allowed).
For instance, to compile a file using SDL2
in C++
, you would
need to say something like
clang++ foobar.cpp -I/usr/include/SDL2 -D_REENTRANT -L/usr/lib -lSDL2
With proper library construction, the corresponding Oak
command can be simply acorn foobar.oak
.
This auto-build system necessitates the concept of the translation unit.
The term translation unit has been thrown around without
explanation several times in this document. A translation unit
in the context of Oak
is the network of .oak
, .o
, .od
,
.cpp
, .hpp
, .h
, and .c
files used by acorn
during a
given command.
For instance, the command acorn tests/macro_test.oak
would
begin with a translation unit containing only
tests/macro_test.oak
. If that test were to open with the
line package!("std");
, the translation unit would grow to
include the file listed under INCLUDE=
in the std
package.
This file (std/std.oak
), would likely contain link!
,
include!
, and possibly even package!
macros, all of which
would expand the translation unit.
When addendum VI mentioned that only one thread function may be defined per translation unit, that means that a file is barred from using multithreading if any of its "parent" files use it.
This section is confusing and is only necessary to read if you are developing an interfacial library.
It is sometimes required for the C
version of a struct to
have more members than its Oak
counterpart. For instance, the
interface for the sdl
package defines the sdl_window
struct.
On the C
side of the interface, this struct contains a
SDL_Window *
member and a SDL_Renderer *
member. These two
items are not included in Oak
because they would require too
many useless symbols to be loaded into memory. If they were
included, all structs and definitions which are somehow related
to them would need to be interfaced and loaded as well. Oak
's
solution to this is member padding. It is perfectly legal to
have mismatched struct members across an interface, as long as
one is not smaller than the other and padding is never touched.
This explains why the sdl_window
struct has the confusing line
__a, __b, __c, __d: i128,
. This adds four i128
members,
effectively requiring 64 bytes of space at the end of the
struct. This space holds the extra members on the C
side.
If a program were to modify the padding of a padded struct, the other side of the interface would have its data destroyed and made nonsensical.
// sdl/sdl_interface.c
struct sdl_window
{
u64 width, height;
SDL_Window *wind;
SDL_Renderer *rend;
};
// sdl/sdl_interface.oak
let sdl_window: struct
{
width, height: i32,
__a, __b, __c, __d: i128,
}
The atomic str
struct is exclusively for statically-stored
string literals. Anything that is included in your source code
surrounded by quotations is a str
. These are of fixed size,
and are stored directly in the compiled source code. On the
other hand, the string
struct (available through the
std/string.oak
file) is a dynamically-resizing container for
storing strings. These structs keep their memory on the heap, so
they can be of any size. These are much more useful, as they can
be the return type of functions and Oak
can more easily
interface with them. Internally, strings
are
dynamically-resizing arrays of i8
atomics (in C++
terms, a
vector
of char
's). string
structs were introduced in Oak
0.0.6.
The std/files.oak
file defines the i_file
and o_file
structs, which handle input and output to and from files
respectively. i_file
s read from files, and o_file
s write to
files. std/files.oak
is an interface to the C
FILE *
system.
i_file
s and o_file
s do not open a file when they are
created! You must call the open
method before they can be used
to read or write.
Here is a useful demonstration of both structs.
package!("std");
include!("std/files.oak");
include!("std/string.oak");
use_rule!("std");
let main() -> i32
{
// Write test file
let out: o_file;
let to_write: string;
out.open("hi.txt");
New(@to_write, "abcdefghijklmnopqrstuvwxyz\n");
out.write(to_write);
out.close();
to_write.Del();
out.Del();
// Read test file
let str_a: string;
let str_b: string;
let inp: i_file;
inp.open("hi.txt");
str_a = inp.read(to_u128(5));
str_b = inp.read(to_u128(5));
print("a: ");
print(str_a);
print("\nb: ");
print(str_b);
print("\n");
inp.close();
inp.Del();
str_a.Del();
str_b.Del();
0
}
A variable can store a pointer to a function. For instance, a
variable f
can hold a pointer to a function taking an i32
and yielding an i32
via the code let f: ^(i32) -> i32;
. The
argument types go inside the parenthesis, and the return type
after the ->
yield operator. This variable can later be made
to point to any function mapping the argument list to the yield
type. It can be called in the same way as any other functions.
For example, the program
package!("std");
let a(what: i32) -> i32
{
what
}
let b(what: i32) -> i32
{
(2 * what)
}
let main() -> i32
{
let f: ^(i32) -> i32;
f = @a;
print(f(1));
print("\n");
f = @b;
print(f(1));
print("\n");
0
}
would print
1
2
Enums in Oak
are essentially ways for one type to encompass
multiple types. Enums are very similar to structs, except that
they can hold only one of their values at once. However, unlike
a C++
union, the type of an Oak
enum is also stored. In this
way, it is like a combination of C++
's enum and union. An
Oak
enum declaration looks like this:
let example: enum
{
option_a: i32,
option_b: unit,
option_c: preexisting_struct_name,
}
Note that an Oak
enum looks exactly like an Oak
struct.
Enums are handled through use of the match
statement as below.
let main() -> i32
{
// Assume the previous code chunk is included
let obj: example;
match (obj)
{
case option_a(data)
{
// `data` becomes a pointer to the i32 which
// option_a entails
}
case option_b()
{
// In this case, no variable is captured from
// option_b. This is almost always the case for unit
// structs (obviously) but can also be done with
// non-unit structs.
}
default
{
// There can be no capture in a default option,
// because the capture's type could not be known at
// compile-time.
}
}
0
}
The match
statement is one of two ways to access the data
inside enums in Oak
, with the other being the
compiler-generated "wrap" methods. The compiler generates
let wrap_OPTION_NAME(self: ^ENUM_NAME, data: OPTION_TYPE) -> void
for each option OPTION_NAME
in an enum ENUM_NAME
. This wraps
the data variable into the enum. This all happens via a compiler
C
interface.
Note that, if the type of an enum option is the unit struct, the compiler will instead generate a single-argument "wrap" method, taking only the self.
Generally speaking, a wrap method puts data into an enum, and a match statement accesses it.
The printf!
(print formatted) macro is a very useful and
compact way to print things. It is available via
include!("std/printf.oak");
. This is similar, but not exactly
like, C
's printf function. The first argument to the printf!
is a format string. Any percentage signs %
within this format
string will be replaced by the macro arguments which follow. For
instance, printf!("hello %\n", 5);
would print hello 5
. This
works with literals or variables- anything for which the
single-argument print
function is defined. The first argument
after the format string corresponds to the first %
, the second
to the second, and so on. For instance,
printf!("% % % % %", 1, 2, 3, 4, 5);
would print 1 2 3 4 5
.
package!("std");
include!("std/printf.oak");
use_rule!("std");
let main() -> i32
{
printf!("The number five: % \nThe number six: % \n", 5, 6);
printf!(
"The following should be a literal percentage sign: \\%\n"
);
printf!("This test the failure system of printf % \n");
let i: i32 = 5;
let j: f64 = 1.234;
printf!(
"Variable i is %, j is %, and some text is '%'.\n",
i,
j,
"some text here"
);
0
}
Oak
, like any sane language, has no NULL
. Any item which may
sometimes be NULL
must be explicitly wrapped in an opt
enum.
This enum has only two options: some
and none
. In the some
state, it holds exactly one instance of the generic t
.
Similarly, Oak
does not have a throw
/ catch
error system.
Instead, the control flow must be explicitly controlled at
compile-time. Any function which has the possibility to error
must return its values wrapped in a result<g, b>
enum. This
has two states, ok
and err
. The ok
state holds one
instance of the g
(for good) generic, while the err
state
holds one instance of b
(for bad). Error recovery must be
explicitly handled via match
statements.
The assert!
macro is an extension of the panic!
macro, and
is available in std/panic_inter.oak
. assert!
takes a single
argument. If this argument evaluates to false
, it will call
panic!
. Otherwise, it will do nothing. The literal output of
this macro is if (CONDITION) {} else { panic!("CONDITION"); }
.
This makes it a recursive macro, or a macro which calls other
macros.
Oak
has very limited functionality for boolean values by
default. For instance, Oak
has no negation (!
) operator,
because the operation-substitution portion of the compiler
only handles binary operations (ones which have a left and
right side). The negation operator is unary, and thus is not
handled there. The bool
rule (available via std/bool.oak
)
provides this functionality, as well all allowing the
python
-style not
, or
, and and
keywords.
Note: While using the bool
rule, both !
and and
are legal
ways to call the negation operator.
To cut down on undefined behavior (resulting from uninitialized
variables), every datatype in Oak
has one and only one
"default" constructor. The default constructors of all struct
members are called upon instantiation. This default value is
said to be the "canonical" or "unit" value of that datatype.
As such, Oak
cannot have uninitialized variables.
As should be obvious to anyone who has worked with a templated
data structure library, it is necessary for structures to
generically have not only members, but methods. However, Oak
as discussed thus far has no way to instantiate a group of
generic methods when a generic data structure is instantiated.
This is where the pre
and post
keywords come in. A pre
keyword means that the following code block will need to be run
before a template is instantiated. post
is the same, but it is
run after the instantiation. Failure in either of these code
blocks means that the template is invalid, but is not treated as
a compiler error.
let list<t>: struct
{
data: t,
next: ^list<t>,
}
let New<t>(self: ^list<t>) -> void;
let Del<t>(self: ^list<t>) -> void;
let main() -> i32
{
New<i32>;
Del<i32>;
let node: list<i32>;
0
}
Oak
as discussed so far necessitates lines 7 and 8. These tell
the compiler to instantiate the generic functions New
and
Del
for use with t = i32
. However, this is obviously
cumbersome to an end user. Thus, we introduce the post
block.
let list<t>: struct
{
data: t,
next: ^list<t>,
}
post
{
New<t>(self: ^list<t>);
Del<t>(_: ^list<t>);
}
let New<t>(self: ^list<t>) -> void;
let Del<t>(self: ^list<t>) -> void;
let main() -> i32
{
let node: list<i32>;
0
}
The above code tells the compiler to instantiate New
and Del
with the t
generic anytime the node<t>
template is
instantiated. Thus, these are automatically instantiated by the
compiler when the struct is. Also notice that the argument name
in the post
block doesn't matter, so long as the rest of the
instantiator call matches the candidate.
This system is in response to the fact that, in Oak
, a struct
does not inherently "come" with any methods, and thus the exact
methods to instantiate with a generic struct are unknown. This
allows the programmer to tell the compiler exactly what is
needed for use with the struct.
It is often useful to separate the declaration of a function
from its implementation (see the earlier section on division of
labor). However, it is necessary for any function to have
exactly one implementation to avoid undefined behavior. When
creating structs, a definition for New
and Del
are always
required. However, it is not always practical to write these
functions explicitly each time, especially if they are to be
unit (empty) functions. For these reasons, we introduce the
concepts of
explicit, implicit, and autogen function definitions.
// A casual definition
let do_thing() -> void;
// An explicit definition
let do_thing() -> void
{
print("hello\n");
}
All casual definitions are erased when an explicit definition is
called. This allows external definitions and interfacial files.
For instance, you could link an object file to Oak
which
provides the definition for a casual definition. You cannot,
however, do this with an explicit definition. This will cause
multiple definitions of the symbol.
When a struct is defined, a unit explicit definition for New
and Del
are added to the symbol table. These do provide an
explicit definition, and thus cannot be overridden. There is,
however, a way to avoid this. Casual definitions erase not only
implicit definitions, but also auto-generated unit definitions.
Thus, if you provide a casual definition for New
or Del
, you
can define them in external object files and link them together.
Autogen functions are marked with the //autogen
special
symbol.
File names, especially in libraries, and especially in the
Oak
std
library, should be clear and concise. Names should
be singular, even if the file describes a class or data
structure (IE std/strings.oak
should become std/string.oak
).
Files linking Oak
to another language should be as physically
close as possible to their partner file as possible. These
interfacial partner files should be named as close to the Oak
naming scheme as is possible for their language, and their
compiled object files must adhere to the Oak
naming
scheme. File names may be shortened to a reasonable degree, so
long as they do not conflict with any other package and their
full name can be reasonably extracted from the abbreviation (for
instance, input_output.oak
to io.oak
is a valid
abbreviation). The same can be said about struct and enum names.
If there is an addendum required to the name, it should go after
the main name (for instance, the std
file
std/conv_extra.oak
).
Files should provide nothing more than they have to. If you have
the option to split something (especially interfaces) off into
its own file, you generally should do so. For example,
std/unit.oak
is one line of code long (let unit: struct;
)
because it does not need to have anything more. Oak
code
and, by extension, .oak
files should be minimalist.
c_print!
prints a message at compile time, concatenating the
passed strings. c_panic!
does the same thing, but throws the
result as a compile error. c_sys!
runs a command at compile
time. In any of these cases, the message will be prefaced by its
originating file and line number. Since these macros are
compile-time only, any variables passed will not be evaluated;
Only their names will be printed.
The c
refers to compilation, not to the C
programming
language.
Because Oak
uses pointers to denote mutability, functions are
allowed to automatically reference or dereference arguments.
However, in maintaining interoperability with the C
family,
Oak
also supports pointer arrays. This combination of facts
makes Oak
pointers uniquely hard to work with. To address
this, the ptrarr!
and ptrcpy!
macros are used. ptrcpy!
takes two pointers, and copies the second into the first. If
these are a
and b
, it performs a = b
. ptrarr!
takes the
head of a pointer array and an integer. If we are to denote
these a
and b
, it returns a[(unsigned long long)b]
(that
is to say, *(a + (unsigned long long)b)
).
Note: Pointer copying is only allowed in places where alloc!
and free!
are valid, since it can cause duplicate addresses.
ptrcpy!
can also accept 0
as its second argument, setting
the pointer to the null pointer.
Oak
does not have private members, but sometimes it is still
useful to disallow the use of a type after its declaration. In
this case, you can use the erase!
macro. This takes the name
of a struct, and disallows it from being used again. For
instance, the std/interface.oak
file provides erased structs
of various sizes for use in struct interfaces. These items can
never be accessed, but still take up space within a struct.
A test suite is a set of unit tests that acorn
will compile
and optionally execute in order to test the validity of a build.
A test suite is created by creating a folder named tests
, and
putting any number of *.oak
files within it. After this
structure has been imposed, acorn -T
will compile the entire
test suite and report on any failures it discovers. Similarly,
acorn -TE
will compile and execute each item in the test
suite. acorn -TT
will test as usual, but stop going after the
first failure. The same goes for acorn -TTE
. A compilation is
considered a success if and only if the acorn
compile process
returns no errors. An execution is considered a success if and
only if its main function returns a 0
. Either failure case
occurring will be considered a failure and will count towards an
internal failure counter. The results of running the test suite
will be reported after the test's conclusion.
The Oak
standard test suite is included in the installation
git
repo. Each item within this test suite must compile in
order for an Oak
update to be considered valid. This can be
tested via make test
or acorn -T
. These items do not need to
execute with a zero exit status, as some of them are
demonstrations.
Test suite executables are not individually saved. Each test
is compiled into a.out
, which is overwritten by the next test.
The raw_c!
macro completely bypasses the Oak
compiler and
directly inserts its arguments as C
code. This can be used for
more advanced language development features which may require
efficiency or capabilities beyond that of Oak
.
$apling
is limited in the modifications it can make to Oak
,
but it is nevertheless a vital resource. In addition to the
existing rule system, a more powerful rule engine is in
development. This new rule system is called TARM
, for Token
Augmentation Register Machine. It is a type of Turing Machine
optimized for low-level token stream augmentations which will be
controlled by an assembly language. A rigorous exploration of
TARM
will not be attempted here, especially since it is not
yet implemented into Oak
.
acorn
version 0.2.11
introduced the option for multiple
rule engines, allowing the future integration of TARM
.
The following are the canonical Oak
keywords- That is to say,
these words cannot be the names of variables, structs, enums,
or functions in canonical Oak
. Oak
aims to have as few
keywords as possible, keeping with its theme of baseline
minimalism.
- let
- if
- else
- while
- match
- case
- default
- pre
- post
Note: There is nothing stopping a rule or dialect from
adding more keywords. For instance, the bool
rules add the
and
, not
, and or
keywords.
The following are atomic (built-in, indivisible) types.
- struct
- enum
- integers - u8, i8, u16, i16, u32, i32, u64, i64, u128, i128
- floats - f32, f64, f128
- str - equal to
^i8
- bool
- void
The unit
(memberless 1 byte) struct is provided by
std/unit.oak
.
The following are atomic (built-in, indivisible) macros.
- include!
- link!
- package!
- new_rule!
- use_rule!
- bundle_rule!
- erase!
- alloc!
- free!
- c_print!
- c_panic!
- c_sys!
- type!
- size!
- ptrcpy!
- ptrarr!
- raw_c!
This list is not inclusive, as additional compiler macros are constantly being added.
Some miscellaneous notes which are not long enough to warrant their own section in this document:
Oak
does not have namespaces under the canonical dialect, although thenamespace
orstd
rules provide a simulacrum of them.Oak
was previously (0.1.*) translated toC++
. As of 0.2.0, it is instead translated toC
.- As of version
0.0.10
, theNew
andDel
operator aliases are single-argument only. Just useCopy
for anything else. - You can call a multi-argument
=
operator (Copy
) like this:a = (b, c, d);
(as ofOak
0.0.10
)
The following packages are included with the Oak
source code
on github
. These can be installed via the command
make packages
.
This package includes a regular expression interface and some resources for bitwise, hexadecimal, and ASCII printing of data structures. These prints allow you to see the contents of any arbitrary type.
This package interfaces with the C
SDL2
library, which
allows access to graphics. This library is useful for games.
This package includes files vital for the function of nearly all
Oak
programs. It should be included in virtually all programs.
This package is the standard templated library for Oak
. It
includes generic implementations of the following data
structures.
- Binary Search Tree
- List (doubly linked list)
- Map (linear probing symbol table)
- Queue
- Set
- Stack
- Vector (vec)
This section is not required for a programmer-level
understanding of the Oak
language, but may be useful for
troubleshooting and deeper understanding. The sections detailed
below view each section of the Oak
compiler as a discrete
object, even when that is not technically the case (for
instance, "the sequencer" and the "reconstructor" are really
tightly intertwined, and few, if any, of the underlying units
are represented by C++
objects in code).
The following sections represent a tangled nest of calls, and thus it should not be assumed that they are presented in the strict order in which they occur.
acorn
uses C
as an intermediate representation. This is done
for the sake of portability, since C
is very close to a
universal machine language and C
compilers have been heavily
optimized. In early development, acorn
instead used C++
, but
this proved too costly and largely unnecessary. It is plausible
that acorn
one day switches to LLVM IR
instead of C
, but
this is not immediately probable and should not be expected.
The acorn
resource is a wrapper for the underlying Oak
compiler system. It also provides numerous other resources, such
as package management, which will not be detailed in this
section. As far as the Oak
compilation process is concerned,
acorn
takes in a file name, and processes all the code within
that file, recursively following inclusions therein.
It also handles higher-level calls to external systems. For
instance, acorn
calls the C
compiler after the translation
process, and the C++
linker after compilation. While the
following compiler sections are usually devoted to being free of
fluff, acorn
is allowed to have user-friendly features.
Upon taking in a file name, the compiler frontend calls the lexer upon its contents. The lexer is detailed next.
Note: Within Oak
documentation, the terms token
and
symbol
are often taken to have the same meaning.
Additionally, "lexical tokenization" is referred to as
"lexicographical symbol parsing". This is an Oak
-specific
term.
The lexer takes in a string, and returns a token stream (a stream of discrete strings, each of which is the smallest it could meaningfully be). This process is called lexical tokenization, and these tokens will be the only thing used by the remainder of the compilation process.
Tokenization is context-dependant. The same symbol set that in
one place would signify a new token could, in another, continue
on in the same token. For instance, if the lexer came across the
string "a+8"
, it would yield the token stream "a", "+", "8"
.
If it instead came across the string "a8+8"
, it would yield
"a8", "+", "8"
. In this case, the 8
caused a new token to
be started when following +
, but not when following a
.
During lexing, several important processes happen. The first of
these is that all comments, single-line or multi-line, are
erased. All #
-begun lines (the only legal use of which being
the shebang #!/path/to/program
) are also removed. The second
important process is the insertion of special symbols. These
are symbols which are ignored by everything following this
process. They carry additional information that might otherwise
be lost by tokenization, like the line number. These begin with
//
, but they are not tokenizations of comments in the
source code. They are only inserted by the lexer, not the source
code.
The final pass of the lexer is called the conglomeration pass.
At this time, the tokenized input will be iterated over, and
qualifying series of tokens will be merged. For instance, -
is
a valid token on its own, but when it precedes a number, it
should not be considered its own token. Similarly, two string
literals separated by whitespace will be merged together.
The operation substitution phase of compilation replaces operator calls (addition, subtraction, etc) with their formal function call definitions. This phase is a sort of hardcoded preprocessor rule pass. After this phase, no operators remain- only function calls.
A rule is a combination of an input pattern and an output pattern. The input pattern represents, via a series of cards (space-separated tokens which match some set of other tokens), the set of token streams which the rule targets. The output pattern is the content which a match will be replaced by in the token stream. There are several special cards for output patterns as well. More about the specifics of rules can be found earlier in this document.
The rule processor, which is by default $apling
, is a
specialized turing machine (or, for more limited rule
processors, deterministic finite automata) for the augmentation
of patterns of tokens (or symbols, in Oak
terminology).
The original $apling
is a push-down automata, meaning it has
limited functionality. However, work is in progress to replace
it with a Token-Augmentation Register Machine (TARM
), which
will offer Turing-completeness and thus more complete
functionality.
The following lists several models of automata from least powerful to most, with an example for each. Each successive class can perform all computations of those above it and more.
- Combinatorial Logic (Boolean logic)
- Finite-State Automata (RegEx)
- Push-Down Automata (Sapling rule engine)
- Turing Machine (TARM rule engine)
A rule engine is a function which takes in a certain portion of
the input token stream and augments it in some way. This is an
abstraction which allows $apling
and TARM
to function
interchangeably in the compilation process. New rule engines
can only be added by directly modifying the C++
acorn
source
code and recompiling. Thus, they are not to be considered a
front-facing resource for developers- Even those of dialects.
Upon encounter of the new_rule!
macro, the acorn
compiler
frontend adds the rule contained within into its internal bank
of rules. Upon use_rule!
, it copies that rule into the bank
of rules which are used in the current file. use_rule!
does not carry over into other files.
The standard call to new_rule!
takes three arguments: A rule
name, an input pattern, and an output pattern. However, if a
fourth argument is provided, it will be treated as the rule
engine. If no such engine is provided, $apling
will be
presumed.
During the iterative process executed by the acorn
compiler
frontend on a given file, there is a stage wherein rules are
handled. The specific execution location of this stage, as well
as the number of times it is executed, has changed numerous
times in the lifetime of Oak
.
During this phase of compilation, the token stream is iterated over, and each of the active rules are applied. If a rule matches, the matched token stream section is replaced accordingly. The specifics of this vary according to the rule processor's pattern language, and cannot be detailed here. More information about these specifics can be found earlier in this document.
Note: This process can occur multiple times. A rule can, for instance, cause the inclusion of another file which must be loaded before compilation can continue. This might, in turn, require the application of more rules. This section ceases after no further rules can be applied.
Macros are compile-time functions. They output raw code, which is then used by the final program. They have the same notation as compiler macros, which are analogous to preprocessor directives in other languages. Technically, however, compiler macros operate on the compiler itself, rather than the source code's token stream.
Preprocessor definitions are compile-time constants, as opposed
to functions. They are analogous to C
's #define
directive,
except that they cannot be redefined after creation.
Note: Macros and definitions cannot be redefined. Attempting to do so will cause an error.
Macro definitions are collected by the acorn
compiler frontend
during one phase of the iterative compilation process, and
replaced with their corresponding values before sequencing.
Macro definitions are written to their own temporary .oak
file
during the build process. After this, they are compiled into
executables by acorn
. Input to a macro is provided as command
line arguments into these executables, and their cout
output
is reinserted into the code at the desired spot.
Sequencing is the penultimate step of compilation, followed by reconstruction. This is the most time-consuming step, where the specific semantic meanings of statements are determined and they are translated into their equivalent low-level-language code. This is a highly recursive process, and is always evolving.
Oak
sequencing is perhaps best detailed by the source code
itself. Thus, this author invites you to analyze sequence.cpp
.
The final phase of compilation is reconstruction, in which the
low-level statements generated by sequencing are assembled into
an equivalent C
program, which is then compiled. This
essentially works as a specialized traversal of the syntax tree
where each node is replaced by its C
statement. This C
code
is then written to an output file and compiled via an external
C
compiler. If so desired, a C++
linker can also be called.
Note: In Oak
, the terms templated and generic are used
interchangeably. The former comes from C++
, and the later
from Rust
(in the author's experience).
The template instantiation unit is the section of the Oak
compiler which handles instantiation calls for generic /
templated code blocks. There are three parts to the
instantiation process, detailed below.
When a function, enum or struct is detected to be generic (via
the <t>
-style generic declaration), it is inserted as a
template into the template instantiation unit's internal bank of
templates. The specific generic arguments are also stored, as
well as the type (for instance, a function or enum).
Later on, the sequencer will detect a call to the instantiator,
denoted <some_type_which_exists>
. Upon hearing this, the
instantiator will search through its templates until it finds
one which suites the call (some template such that, when the
generic is substituted into the type, the resulting type matches
that used in the instantiator call). Alternatively, if this
specific template / generic combination has already been
instantiated, it will simply return. Otherwise, once the correct
template has been identified, it will move on to part 3.
A pre
block represents the things which must be true in
order for a certain template to be viable. For instance, the
choice of whether to use a tree or hash-table based lookup
table will depend on whether the data is hashable or only
comparable. A pre
block can check for hashability, and
reject a given template if it cannot be ensured.
A post
block represents things that should logically be
instantiated along with a given template. For instance, if we
are instantiating a struct, we would logically also want to
instantiate its constructor and destructor.
pre
blocks are executed before the main template during an
instantiation call, and post
blocks are executed after.
Upon the selection of the proper code block, the instantiator
will begin by iterating through the tokens of the code block,
replacing any occurrence of the i
-th generic with the i
-th
generic specified during the instantiator call. For instance,
if the template was declared via <t, h>
and the instantiator
call was <i32, f64>
, it would replace all instances of t
with i32
and all instances of h
with f64
. After this
replacement, the resulting code section would be fed to the
sequencer like any other code. This causes the symbols within
to be entered into the global namespace as if they were never
generics at all.
Note: The generic system exists exclusively within Oak
; There
is no way to interface C++
generics with Oak
or vice-versa.
The package!
macro belongs to the compiler macros, in that it
directly controls the actions of the compiler before any other
code is evaluated (roughly equivalent to the preprocessor in
C
). Upong a call to package!
, the compiler looks for a
directory in the Oak
include directory which matches the
passed name. On UNIX systems, this directory is
/usr/include/oak
. If you wanted to import the std
package,
acorn
would check for the existance of /usr/include/oak/std
.
If this folder does not exist, acorn
throws an error.
Assuming that the aforementioned folder does exist, acorn
attempts to load the oak_package_info.txt
file within it. This
is a specially formatted file which tells the compiler some
basic information about the package: Most importantly, it
contains an INCLUDE
tag, which lists the file(s) to include
when the package is loaded by the compiler. This value is
usually a central "linkage" file which links all the others
together. Upon finding this entry point, acorn
will begin
loading all files included by the package. After finishing, it
will return to after the original include!
statement.
The mangler is how Oak
differentiates between functions with
identical names but different types. It takes in a type and a
function name, and returns a string which fully disambiguates
this function from any others of the same name. It does this by
appending a certain special string for each item in the type.
Oak
types can be represented in a linear fashion. Consider the
following code.
let a: i32;
Internally, Oak
's symbol table will store a definition for a
with type ["i32"]
(where []
represents a C++
vector).
let b: ^^i32;
Oak
would store this type as [PTR, PTR, "i32"]
. Similarly,
let c(arg: bool) -> void;
would be [FN, "bool", MAPS, "void"]
. A more complex version
like
let d(arg_a: ^i32, arg_b: ^^bool) -> bool;
would become
[FN, PTR, "i32", JOIN, PTR, PTR, "bool", MAPS, "bool"]
. This
is a way to represent unambiguous types in a linear vector
format. The mangler leverages this to turn a type vector into a
string.
If a symbol a
has the type [FN, "i32", MAPS, "void"]
, the
mangler would produce the output name a_FN_i32_MAPS_void
.
This symbol would keep the same type, but would be unambiguous
in name. This is one of the reasons why capital letters are not
allowed to coexist with underscores in Oak
- it could interfere
with the mangler.
The following are several more complex cases mapping input Oak
directly to its output C
.
let a(arg: i32, arg_b: ^^bool) -> bool;
let foo: struct
{
,
}
let b(arg: foo) -> ^void;
let c(arg: ^(_: bool) -> void) -> void;
struct foo
{
};
bool a_FN_i32_JOIN_PTR_PTR_bool_MAPS_bool(i32 arg, bool **arg_b);
void *b_FN_foo_MAPS_PTR_void(struct foo arg);
void c_FN_PTR_FN_bool_MAPS_void_MAPS_void(void (*arg)(bool));
This can become a pain in interfacial files, where the literal
name of the symbol defined externally must match the Oak
mangled version. However, many language manglers do not even
provide a consistent standard across compilers, so Oak
is at
least better than that.
Consider the following Oak
code for a singly-linked list node.
let node<t>: struct
{
data: t,
next: ^node<t>,
}
post
{
New<t>(_: ^node<t>);
}
let New<t>(self: ^node<t>) -> void
{
// ...
}
let main()
{
node<i32>;
}
What does the instance of New
created for node<i32>
mangle
to? In fact, what does node<i32>
mangle to? Oak
's target
language is C
, which does not have generics, so it cannot
contain the <...>
syntax.
As you may have guessed, this solution is solved similarly to
the one mentioned above. node<i32>
simply mangles to
node_GEN_i32_ENDGEN
, and this is used as its atomic type name
internally for mangling purposes. This means that the Oak
let New<i32>(self: ^node<i32>) -> void
will mangle to the C
void New_FN_PTR_node_GEN_i32_ENDGEN_MAPS_void(struct node_GEN_i32_ENDGEN *self);
This means it is possible for the compiler (and technically the
programmer) to tell the type of a given symbol in the target C
code given just its mangled name.
There are several ways in which a function can be matched to a function call.
- Precision match (all types match exactly)
- Implicit casting for built-in types
- Auto referencing and dereferencing
Demonstration of the first case:
let fn(a: foo, b: bar) -> void;
let main() -> i32
{
let first: foo;
let second: bar;
fn(first, second);
0
}
Demonstration of the second case:
let fn(a: i128) -> void;
let main() -> i32
{
let arg: i32;
fn(arg);
0
}
Demonstration of the third case:
let fn(a: ^i32) -> void;
let main() -> i32
{
let arg: i32;
fn(arg);
let arg2: ^^i32;
fn(arg2);
0
}
If a function signature can be matched exactly, the compiler will not look for signatures using casting. If a signature can be found exactly or by using casting, the compiler will not look for signatures requiring referencing or dereferencing.
Note: Implicit casting can cast anything from the set
[i8, u8, i16, u16, i32, u32, i64, u64, i128, u128]
to anything
else in that set, or anything from the set [f64, f128, f256]
to anything else in that set. No other types are supported for
implicit casting, including any pointers to these types.
The remainder of this document is dedicated to examples designed
to get a programmer comfortable with Oak
. These examples are
largely arranged from easiest to hardest, with the final few
demonstrating the more esoteric regions of Oak
development
(like interfaces and dialects).
Difficulty: Easy
This example covers:
- Terminal input and output
- Random number generation
- Preprocessor definitions
- The
printf!
macro
This example requires:
- An understanding of
Oak
syntax - A Linux terminal
- An
acorn
compiler installation
Let's create a game where you have 5 chances to guess a randomly
selected integer. Upon an incorrect guess, the program will
inform the user if they guessed too high or too low. Let's start
by importing the needed files, which should all be contained
in the "std" package and the "std/rand_inter.oak" file. We will
also activate the "std" rule, so that we can use some common
Oak
syntactical shortcuts like for
loops.
package!("std"); // For printing and standard functions
use_rule!("std"); // For common syntactical shortcuts
include!("std/rand_inter.oak"); // For random numbers
// A skeleton main file, which we will write our program in
let main() -> i32
{
0
}
First, we will need to generate a random number. We will do this
via the seed_rand()
and rand(low, high)
functions, both
available via the "std/rand_inter.oak" file. This file
interfaces with the operating system's random number generator.
The seed_rand()
function, when called with no arguments, seeds
the random number generator with the current time. rand
yields the next number generated by the RNG given its existing
seed. Thus, you must call some form of RNG seeding function
before rand
.
We will save our random number in a variable named chosen
.
// Needed packages
package!("std");
use_rule!("std");
include!("std/rand_inter.oak");
let main() -> i32
{
// Without the std rule:
// let chosen: i32;
// chose = rand(0, 10);
// With the std rule:
// let chosen = rand(0, 10);
// The std rule allows "chosen" to take the type of rand
let chosen = rand(0, 10);
}
But what if we want to use numbers in a different range? We should pull these numbers out into their own symbols. Let's use preprocessor definitions, so that our symbols do not take up runtime memory.
Upon compilation, whenever a preprocessor definition is encountered, it is immediately replaced by its definition. They are different from variables in that they cannot be redefined.
A preprocessor definition is defined as follows.
// Not within any code scope
let var_name! = var_definition;
We can use this to augment our program as follows.
// Needed packages
package!("std");
use_rule!("std");
include!("std/rand_inter.oak");
// Preprocessor definitions
let low! = 0;
let high! = 10;
let main() -> i32
{
let chosen = rand(low!, high!);
0
}
We should also add some output, so that the user knows what
range their guess should inhabit. We can do this using the
printf!
macro, available via the std/printf.oak
file. This
macro allows variables to be outputted in the middle of a
string according to a format string. Inside a format string,
any "%" will be replaced by the next variable passed. For
example,
package("std");
use_rule!("std");
include!("std/printf.oak")
let main() -> i32
{
let a: i32 = 123;
let b: bool = true;
printf!(
"The value of a is %. The value of b is %.\n",
a,
b
);
0
}
We will use this to modify our code as follows.
// Needed packages
package!("std");
use_rule!("std");
include!("std/printf.oak");
include!("std/rand_inter.oak");
// Preprocessor definitions
let low! = 0;
let high! = 10;
let main() -> i32
{
let chosen = rand(low!, high!);
printf!(
"Guess a number between % and %.\n",
low!,
high!
);
0
}
Next, let's create the loop wherein the user will guess.
// Needed packages
package!("std");
use_rule!("std");
include!("std/printf.oak");
include!("std/rand_inter.oak");
// Preprocessor definitions
let low! = 0;
let high! = 10;
let num_guesses! = 5;
let main() -> i32
{
let chosen = rand(low!, high!);
printf!(
"Guess a number between % and %.\n",
low!,
high!
);
// This for loop has been broken over multiple lines, but
// they can be written within one line as well.
for (let current_guess = 0;
current_guess < num_guesses!;
current_guess += 1)
{
// Do guessing here
}
0
}
We can read an integer from the terminal using the get_i128
function. We will store this in a variable called guess
.
// Needed packages
package!("std");
use_rule!("std");
include!("std/printf.oak");
include!("std/rand_inter.oak");
// Preprocessor definitions
let low! = 0;
let high! = 10;
let num_guesses! = 5;
let main() -> i32
{
let chosen = rand(low!, high!);
printf!(
"Guess a number between % and %.\n",
low!,
high!
);
for (let current_guess = 0;
current_guess < num_guesses!;
current_guess += 1)
{
let guess = to_i32(get_i128());
if (guess < chosen)
{
// Guess is too low
}
else if (guess > chosen)
{
// Guess is too high
}
else
{
// Guess is correct
}
}
0
}
We will use another variable to shore whether or not the user
has guessed correctly. If this is the case, the for
loop
should halt. In any case, we should update the user on the
validity of their guess.
// Needed packages
package!("std");
use_rule!("std");
include!("std/printf.oak");
include!("std/rand_inter.oak");
// Preprocessor definitions
let low! = 0;
let high! = 10;
let num_guesses! = 5;
let main() -> i32
{
let chosen = rand(low!, high!);
let has_chosen_incorrectly = true;
printf!(
"Guess a number between % and %.\n",
low!,
high!
);
let current_guess: i32;
for (current_guess = 0;
current_guess < num_guesses! && has_chosen_incorrectly;
current_guess += 1)
{
print("Guess: ");
let guess = to_i32(get_i128());
if (guess < chosen)
{
print("Too low...\n");
}
else if (guess > chosen)
{
print("Too high...\n");
}
else
{
print("You guessed it!\n");
has_chosen_incorrectly = false;
}
}
0
}
Let's try it out. Compiling via the command
acorn /PATH/TO/MAIN.oak -o guessing_game.out
creates the
executable guessing_game.out
, which can then be run via
the command ./guessing_game.out
. Here's the output from an
example run.
[user@hostname guessing_game]$ ./a.out
Guess a number between 0 and 10.
Guess: 5
Too high...
Guess: 2
Too low...
Guess: 4
Too high...
Guess: 3
You guessed it!
It works!
Difficulty: Easy
This example covers:
- Terminal output
- Command line arguments
- Regular expressions
- File input and output
- Panics
- Colored terminal output
std/bool
and thebool
rule
This example requires:
- An understanding of
Oak
syntax - A Linux terminal
- An
acorn
compiler installation - A
boost
C++
library installation - A basic understanding of regular expressions
Let's create an Oak
program which will take in a filename and
a regular expression as command line arguments, and output every
line in that file which matches the expression. This is similar
to the grep
utility in POSIX
systems. A typical call to this
program might look something like this:
./oak_grep.out file.txt "regex+ pat?tern h*ere"
Let's start by reading in the two command line arguments.
package!("std");
use_rule!("std");
include!("std/panic_inter.oak");
include!("std/string.oak");
include!("std/printf.oak");
let main(argc: i32, argv: ^str) -> i32
{
// If argc != 3, panic w/ the given message.
assert!(
argc == 3,
"Please provide exactly 3 command line arguments."
);
// Get the command line arguments
let filepath: string = argv.Get(1);
let pattern: string = argv.Get(2);
printf!(
"Got filepath '%' and pattern '%'\n",
filepath,
pattern
);
0
}
[user@host dir]$ acorn tests/oak_grep_ex.oak -o oak_grep.out
[user@host dir]$ ./oak_grep.out file.txt "regex+ pat?tern h*ere"
Got filepath 'file.txt' and pattern 'regex+ pat?tern h*ere'
It works! Now, lets open the given filepath as a i_file
. This
requires the use of the std/file_inter.oak
file, so we will
also add that to our list of inclusions. Let's also switch to
the standard Oak
dialect's notation for include!
macros.
Doing this, we get the following code.
package!("std");
use_rule!("std");
include "std/panic_inter.oak", "std/file_inter.oak",
"std/string.oak";
let main(argc: i32, argv: ^str) -> i32
{
// If argc != 3, panic w/ the given message.
assert!(
argc == 3,
"Please provide exactly 3 command line arguments."
);
// Get the command line arguments
let filepath: string = argv.Get(1);
let pattern: string = argv.Get(2);
// Open filepath
let file: i_file;
file.open(filepath.c_str());
// Line variable to read the file into, line by line
let line: string;
// Read the first line of the file
line = file.getline(to_u128(512));
// Close the file
file.close();
0
}
Now we have the file, but we have no way to iterate over it! To
do this, we will include the std/bool.oak
file in order to
use the bool
rule, which adds the not
keyword (as well as
boolean negation in general). Using this, we can use the
i_file
's eof
function, which returns true if the end of the
file has been reached.
package!("std");
use_rule!("std");
include "std/panic_inter.oak", "std/file_inter.oak",
"std/string.oak";
include "std/bool.oak";
use "bool"; // STD Oak dialect notation for use_rule!("bool")
let main(argc: i32, argv: ^str) -> i32
{
// If argc != 3, panic w/ the given message.
assert!(
argc == 3,
"Please provide exactly 3 command line arguments."
);
// Get the command line arguments
let filepath: string = argv.Get(1);
let pattern: string = argv.Get(2);
// Open filepath
let file: i_file;
let line: string;
file.open(filepath.c_str());
while not file.eof()
{
line = file.getline(to_u128(512));
}
file.close();
0
}
Now we can iterate over the line! However, we still need to do
the actual RegEx searching. For this, let's include the
extra/regex_inter.oak
file, and use the regex
struct and
regex_search
function.
package!("std");
use_rule!("std");
include "std/panic_inter.oak", "std/file_inter.oak",
"std/string.oak", "extra/regex_inter.oak";
include "std/bool.oak";
use "bool";
let main(argc: i32, argv: ^str) -> i32
{
// If argc != 3, panic w/ the given message.
assert!(
argc == 3,
"Please provide exactly 3 command line arguments."
);
// Get the command line arguments
let filepath: string = argv.Get(1);
let pattern: string = argv.Get(2);
let pattern_reg: regex = pattern;
// Open filepath
let file: i_file;
let line: string;
file.open(filepath.c_str());
while not file.eof()
{
line = file.getline(to_u128(512));
if line.regex_search(pattern_reg)
{
print(line);
}
}
file.close();
0
}
It's looking great! But we can be a little fancier. Let's print
the lines which match in green text using the std/color.oak
file. Using this, you just have to call the .green()
method
on any string to turn it green (and .red()
for red, etc.).
While we're here, let's count the number of matches we find,
and output that when we finish.
package!("std");
use_rule!("std");
include "std/panic_inter.oak", "std/file_inter.oak",
"std/string.oak", "std/printf.oak",
"extra/regex_inter.oak", "std/color.oak";
include "std/bool.oak";
use "bool";
let main(argc: i32, argv: ^str) -> i32
{
// If argc != 3, panic w/ the given message.
assert!(
argc == 3,
"Please provide exactly 3 command line arguments."
);
// Get the command line arguments
let filepath: string = argv.Get(1);
let pattern: string = argv.Get(2);
let pattern_reg: regex = pattern;
let num_matches = 0;
// Open filepath
let file: i_file;
let line: string;
file.open(filepath.c_str());
while not file.eof()
{
line = file.getline(to_u128(512));
if line.regex_search(pattern_reg)
{
print(line.green());
num_matches += 1;
}
}
file.close();
print("\n");
if num_matches == 0
{
print("No matches!\n".red());
}
else
{
begin_green_bold();
printf!("% lines matched.\n", num_matches);
end_effects();
}
0
}
Let's try it out! (Note: This document is plaintext, so the terminal colors will not appear here)
[user@host oak]$ acorn oak_grep.oak -o oak_grep.out
[user@host oak]$ ./oak_grep.out oak_grep.oak "print"
"std/string.oak", "std/printf.oak",
print(line.green());
print("\n");
print("No matches!\n".red());
printf!("% lines matched.\n", num_matches);
5 lines matched.
It works!
Difficulty: Moderate
This example covers:
- Terminal output
- Recursive functions
This example requires:
- An understanding of
Oak
syntax - A Linux terminal
- An
acorn
compiler installation - An undergraduate understanding of mathematics
- Understanding of advanced mathematical notation
For this example, we will write a short program to yield the
Collatz sequence of a given positive integer
For a given positive integer
- If
$n_i$ is even,$n_{i + 1} = \frac{n}{2}$ . - If
$n_i$ is odd,$n_{i + 1} = 3n + 1$
These rules can be translated from english to math as below.
This process is iterated until the output is equal to 1. The Collatz sequences of the first few positive integers are shown below.
- 1
- 2, 1
- 3, 10, 5, 16, 8, 4, 2, 1
- 4, 2, 1
- 5, 16, 8, 4, 2, 1
- 6, 3, 10, 5, 16, 8, 4, 2, 1
- 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
- 8, 4, 2, 1
- 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
- 10, 5, 16, 8, 4, 2, 1
The Collatz conjecture states that, for all positive integers, there exists some finite number of repeated iterations which will reduce said number to 1. That is to say, that all Collatz sequences are of finite length. Representing this claim mathematically,
We will designate the mathematical function mapping some Oak
below.
package!("std");
use_rule!("std");
include!("std/math.oak");
// Takes in n_i and returns n_{i + 1}
let collatz(n: u128) -> u128
{
let out: u128;
out = n;
if (even(n))
{
out /= to_u128(2);
}
else
{
out *= to_u128(3);
out += to_u128(1);
}
out
}
We can subsequently create the following function to print the Collatz sequence of a given number.
let print_collatz_sequence(n: u128) -> void
{
print(n);
print(" ");
if (n > to_u128(1))
{
let next: u128;
next = collatz(n);
print_collatz_sequence(next);
}
else
{
print("\n");
}
}
As a tail recursive function, this can easily be rewritten iteratively to conserve stack space- this is left as an exercise for the reader.
Our full program is as follows.
package!("std");
use_rule!("std");
include!("std/math.oak");
// Takes in n_i and returns n_{i + 1}
let collatz(n: u128) -> u128
{
let out: u128;
out = n;
if (even(n))
{
out /= to_u128(2);
}
else
{
out *= to_u128(3);
out += to_u128(1);
}
out
}
let print_collatz_sequence(n: u128) -> void
{
print(n);
print(" ");
if (n > to_u128(1))
{
let next: u128;
next = collatz(n);
print_collatz_sequence(next);
}
else
{
print("\n");
}
}
let main() -> i32
{
for (let i: u128 = 1; i < to_u128(10); i += 1)
{
print_collatz_sequence(i);
}
0
}
The following is the output from this program.
1
2 1
3 10 5 16 8 4 2 1
4 2 1
5 16 8 4 2 1
6 3 10 5 16 8 4 2 1
7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
8 4 2 1
9 28 14 7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
This is correct.
Difficulty: Hard
This example covers:
- The
Oak
mangler - The
link!
compiler macro - Interfacial files
This example requires:
- An understanding of
Oak
syntax - An understanding of
C++
syntax - An understanding of
makefile
s. - A Linux terminal
- An
acorn
compiler installation - A
boost
C++
library installation - A basic understanding of regular expressions
In this example, we will be creating a version of the Oak
extra/regex_inter.oak
interfacial file. This will allow us to
access the C++
boost
RegEx library from within Oak
, and
give us a better understanding of how to create interfacial
files and libraries in Oak
.
Let's begin by creating a folder to contain our interfacial
library. We will name our library re
, for Regular Expressions.
The acorn
compiler conveniently has a command for creating a
new library- acorn -w NAME
, where NAME
is the name of the
package. For our purposes, we will navigate to the folder where
we want our library to be, and call acorn -w re
. This will
create the re
folder, where we will do the rest of this
exercise.
acorn -w re # Creates the package
cd re # Moves into the new dir
ls # Shows the files acorn made
You can fill in your details in re/oak_package_info.txt
if you
want; This step would be required if you wanted to publish this
package. Additionally, it initializes a git repo and creates a
linkage file re.oak
. We will need three more files here: the
two halves of the interface, and a makefile. We will use the
following command to create these.
# From the re folder
touch re_inter.oak re_inter.cpp Makefile # Create files
Let's explore the boost
RegEx functions we will be interfacing
with. First, the regex match function. We will focus on two
instances of this; The two-argument and the three-argument
versions.
The two-argument boost::regex_match
function takes a string
for the text to search, and a boost::regex
object representing
the pattern.
The three-argument boost::regex_match
function takes a string
for the text, a boost::smatch
object to record the results
into, and a boost::regex
for the pattern.
Both of these functions return booleans. Clearly, we will need
an Oak
wrapper struct for the boost::regex
and
boost::smatch
classes, as well as wrapper functions for each
of these functions. We will use the built-in std/string.oak
implementation of a string. Let's begin constructing our C++
file.
// re/re_inter.cpp
#include <oak/std_oak_header.hpp>
#include <boost/regex.hpp>
#include <oak/string.c>
// An Oak wrapper class for boost::regex
struct regex
{
boost::regex *re; // Uses dynamically allocated memory
// To get to 32 bytes
char padding[32 - sizeof(boost::regex *)];
};
// An Oak wrapper class for boost::smatch
struct regex_smatch
{
boost::smatch m;
// To get to 96 bytes
char padding[96 - sizeof(boost::smatch)];
};
// Ensures the C++ compiler will not mangle symbol names
extern "C"
{
// Interfaces with the two-argument boost::regex_match
bool regex_match(string *text, regex *pattern)
{
;
}
// Interfaces with the three-argument boost::regex_match
bool regex_match(string *text,
regex *pattern,
regex_smatch *into)
{
;
}
}
This is a solid start, but we will need to do some legwork
before we are fully able to interface. The boost
libraries
require a C++
std::string
rather than the custom Oak
string
which is passed as a parameter. We can avoid this by
using the underlying data
struct member, which is a
contiguous C
-style string. This means we can construct a
std::string
from this raw C
-string and pass that in. For our
interfacial structs, we will simply pass the underlying member.
This leaves us with the following code.
// re/re_inter.cpp
#include <oak/std_oak_header.hpp>
#include <boost/regex.hpp>
#include <oak/string.c> // Oak strings
#include <string> // C++ strings
// An Oak wrapper class for boost::regex
struct regex
{
boost::regex re;
char padding[16]; // To get to 32 bytes
};
// An Oak wrapper class for boost::smatch
struct regex_smatch
{
boost::smatch m;
char padding[16]; // To get to 96 bytes
};
extern "C"
{
// Interfaces with the two-argument boost::regex_match
bool regex_match(string *text, regex *pattern)
{
return boost::regex_match(std::string(text->data), pattern->re);
}
// Interfaces with the three-argument boost::regex_match
bool regex_match(string *text,
regex *pattern,
regex_smatch *into)
{
return boost::regex_match(std::string(text->data),
into->m,
pattern->re);
}
}
Now we have successfully wrapped the boost
functions, but we
have one more hurdle: The Oak
mangler. Because Oak
translates into C
for compilation, it must be able to
disambiguate between functions using only their names. Note how
both of our functions are called regex_match
: If we didn't do
anything to these names, C
would become confused. Thus, the
Oak
compiler mangles the symbol names to include their
full types. The mangler can be unintuitive, but close inspection
of the std
interfacial files should be enough to understand
it.
For a function which in Oak
would be written
let regex_match(text: ^string, pattern: ^regex) -> bool;
The corresponding mangled symbol names is
regex_match_FN_PTR_string_JOIN_PTR_regex_MAPS_bool
This mangled name, not the original name, is what the C
linker will look for. Thus, we must modify our function names
to align with the mangler. Note that struct / enum names are not
mangled by Oak
, and thus remain unmodified. The following code
demonstrates this.
// re/re_inter.cpp
#include <oak/std_oak_header.hpp>
#include <boost/regex.hpp>
#include <oak/string.c> // Oak strings
#include <string> // C++ strings
// An Oak wrapper class for boost::regex
struct regex
{
boost::regex re;
char padding[16]; // To get to 32 bytes
};
// An Oak wrapper class for boost::smatch
struct regex_smatch
{
boost::smatch m;
char padding[16]; // To get to 96 bytes
};
extern "C"
{
// Interfaces with the two-argument boost::regex_match
bool regex_match_FN_PTR_string_JOIN_PTR_regex_MAPS_bool(
string *text, regex *pattern)
{
return boost::regex_match(std::string(text->data), pattern->re);
}
// Interfaces with the three-argument boost::regex_match
bool regex_match_FN_PTR_string_JOIN_PTR_regex_JOIN_PTR_regex_smatch_MAPS_bool(
string *text,
regex *pattern,
regex_smatch *into)
{
return boost::regex_match(std::string(text->data),
into->m,
pattern->re);
}
}
Now we are finished with the C++
portion of our interface for
these two functions. We can now create an Oak
file to
interface with it. Note that this is separate from the
re/re.oak file that acorn
created for us. Linkage files
should not be interfaces. Let's begin by creating the Oak
versions of the interfacial structs.
// re/re_inter.oak
package!("std");
use_rule!("std");
// This is an alternative to the include! macro
// This notation comes in the "std" rule
// The string file is needed here
include "std/string.oak";
let regex: struct
{
,
}
let regex_smatch: struct
{
,
}
It is very important that we match the size of our interfacial
structs to their C++
counterparts. This is why we used padding
to make their sizes a nice number. The std/interface.oak
file
provides immutable / hidden memory block types of varying size.
These blocks are used to build interfaces like these, where the
modification of internal data by an Oak
program would cause
the C++
struct to stop functioning. Using these padding
blocks, we arrive at the following code.
// re/re_inter.oak
package!("std");
use_rule!("std");
include "std/string.oak";
include "std/interface.oak";
let regex: struct
{
__a: hidden_32_bytes,
}
let regex_smatch: struct
{
__a: hidden_64_bytes,
__b: hidden_32_bytes,
}
The double underscore prefix is recommended for interfacial
padding members such as these. Now, we can add the function
signatures which will be filled in at link-time by the C++
implementations.
// re/re_inter.oak
package!("std");
use_rule!("std");
include "std/string.oak";
include "std/interface.oak";
let regex: struct
{
__a: hidden_32_bytes,
}
let regex_smatch: struct
{
__a: hidden_64_bytes,
__b: hidden_32_bytes,
}
let regex_match(text: ^string, pattern: ^regex) -> bool;
let regex_match(text: ^string, pattern: ^regex, into: ^smatch)
-> bool;
Note that Oak
allows at most 96 non-whitespace characters per
line. However, its complete disregard for all whitespace after
lexing means that all code can be split across any number of
lines.
Now, we must add a way for acorn
to know to include the C++
implementations at link-time. This takes the form of a
link!
macro.
// re/re_inter.oak (excerpt)
package!("std");
use_rule!("std");
include "std/string.oak";
include "std/interface.oak";
// More shorthand for compiler macros via the "std" rule
link "re/re_inter.o";
Now we have a basic interfacial library! However, for it to be
useful to use, we must have ways of accessing and modifying the
data in our interfacial structs safely. Since Oak
only allows
zero-argument constructors, we must initialize the regex
struct via the Copy
(=
) operator. Additionally, there are
several fields within smatch
which we need to be able to
access. We will also create the function signatures for these
here.
// re/re_inter.oak
package!("std");
use_rule!("std");
include "std/string.oak";
include "std/interface.oak";
link "re/re_inter.o";
let regex: struct
{
__a: hidden_32_bytes,
}
let regex_smatch: struct
{
__a: hidden_64_bytes,
__b: hidden_32_bytes,
}
let regex_match(text: ^string, pattern: ^regex) -> bool;
let regex_match(
text: ^string,
pattern: ^regex,
into: ^regex_smatch) -> bool;
// To be externally defined in C++
let New(self: ^regex) -> void;
let Del(self: ^regex) -> void;
let Copy(self: ^regex, pattern: string) -> void;
let Copy(self: ^regex, pattern: str) -> void;
// To be externally defined in C++
let New(self: ^regex_smatch) -> void;
let size(self: ^regex_smatch) -> u64;
let str(self: ^regex_smatch) -> string;
let Get(self: ^regex_smatch, index: u32) -> string;
Now, let's create the C++
implementation of these functions.
#include <boost/regex.hpp>
#include <string>
#include "oak/std_oak_header.h"
#include "oak/std/string.c"
struct regex
{
boost::regex *re;
char padding[32 - sizeof(boost::regex *)]; // To get to 32 bytes
};
struct regex_smatch
{
boost::smatch m;
char padding[96 - sizeof(boost::smatch)]; // To get to 96 bytes
};
extern "C"
{
bool regex_match_FN_PTR_string_JOIN_PTR_regex_MAPS_bool(string *text, regex *pattern)
{
return boost::regex_match(std::string(text->data), *pattern->re);
}
bool regex_match_FN_PTR_string_JOIN_PTR_regex_JOIN_PTR_regex_smatch_MAPS_bool(string *text, regex *pattern, regex_smatch *into)
{
return boost::regex_match(std::string(text->data), into->m, *pattern->re);
}
//////////// Methods ////////////
void New_FN_PTR_regex_MAPS_void(regex *self)
{
memset(self, 0, sizeof(regex));
self->re = nullptr;
}
void Del_FN_PTR_regex_MAPS_void(regex *self)
{
if (self->re != nullptr)
{
delete self->re;
}
}
void Copy_FN_PTR_regex_JOIN_PTR_string_MAPS_void(regex *self, string *pattern)
{
if (self->re != nullptr)
{
delete self->re;
}
self->re = new boost::regex(pattern->data);
}
void Copy_FN_PTR_regex_JOIN_str_MAPS_void(regex *self, str pattern)
{
if (self->re != nullptr)
{
delete self->re;
}
self->re = new boost::regex(pattern);
}
void New_FN_PTR_regex_smatch_MAPS_void(regex_smatch *self)
{
memset(self, 0, sizeof(regex_smatch));
}
u64 size_FN_PTR_regex_smatch_MAPS_u64(regex_smatch *self)
{
return self->m.size();
}
string str_FN_PTR_regex_smatch_MAPS_string(regex_smatch *self)
{
string out;
out.size = self->m.str().size();
out.data = new i8[out.size + 1];
strcpy(out.data, self->m.str().c_str());
out.data[out.size] = '\0';
return out;
}
string Get_FN_PTR_smatch_JOIN_u32_MAPS_string(regex_smatch *self, u32 index)
{
string out;
std::string from = self->m[index].str();
out.size = from.size();
out.data = new i8[out.size + 1];
strcpy(out.data, from.c_str());
out.data[out.size] = '\0';
return out;
}
}
Now we've got the C++
and Oak
files in a functional state,
but we still need to give acorn
directions on how and what to
build upon installation. This is where the makefile
comes in.
We will add the following lines to our makefile
.
CC := clang++
FLAGS := -pedantic -Wall
LIBS := -lboost_regex
all: regex_inter.o
regex_inter.o: regex_inter.cpp
$(CC) $(FLAGS) -c -o regex_inter.o regex_inter.cpp $(LIBS)
Finally, we must modify the re/re.oak
linkage file so that it
includes the re/re_inter.oak
interfacial file.
// re/re.oak
include!("re/re_inter.oak");
This is all that is needed for installation. Now, we can run
the command acorn -S re
from the folder which contains the
package folder re
to install our package.
acorn -S re
Let's test it out! We'll create a test file first, and give it a skeleton main function.
touch re_test.oak # Create our test file
// re_test.oak
package!("std");
use_rule!("std");
import "re";
let main() -> i32
{
0
}
To compile this program, we will use the command
# Compile re_test.oak to re_test.out
acorn re_test.oak -o re_test.out
To run it after compilation, we will use
./re_test.out
Let's test our RegEx interface by attempting a few RegEx
matches. We will test the pattern abra*ca?dabra+
against the
strings abrcadabraaa
, abraaaacdabra
, abrcdabr
, and
abarcdabrar
. The first two strings should be a match, and the
second two should be a fail. Here's what our test file should
look like with these tests implemented.
// Use the standard Oak dialect
package!("std");
use_rule!("std");
import "re";
include "std/string.oak";
let main() -> i32
{
let pattern: regex = "abra*ca?dabra+";
let string_one: string = "abrcadabraaa";
let string_two: string = "abraaaacdabra";
let string_three: string = "abrcdabr";
let string_four: string = "abarcdabrar";
// Test string 1
if regex_match(string_one, pattern)
{
print("String one matched.\n");
}
else
{
print("String one did not match.\n");
}
// Test string 2
if regex_match(string_two, pattern)
{
print("String two matched.\n");
}
else
{
print("String two did not match.\n");
}
// Test string 3
if regex_match(string_three, pattern)
{
print("String three matched.\n");
}
else
{
print("String three did not match.\n");
}
// Test string 4
if regex_match(string_four, pattern)
{
print("String four matched.\n");
}
else
{
print("String four did not match.\n");
}
print("Done with everything!\n");
0
}
Now, lets test it out!
acorn tests/regex_ex.oak -o regex_ex.out
./regex_ex.out
<Output>
String one matched.
String two matched.
String three did not match.
String four did not match.
Done with everything!
<End output>
It works!
The techniques used herein can easily be adapted to create a wide variety of interfacial files.
Difficulty: Hard
This example covers:
- Terminal input and output
- Regular expressions
- File input and output
- Socket programming
This example requires:
- An understanding of
Oak
syntax - A Linux terminal
- An
acorn
compiler installation - A
boost
C++
library installation - A basic understanding of regular expressions
- A basic understanding of socket programming
In this example, we will be using the Oak
standard library to
create a file server and client. The client will be able to make
requests to the server, and the server will send back some file
matching the request. Let's start by creating the necessary
files.
# Create Oak files for the file server and client
touch file_server.oak file_client.oak
We'll add a skeleton into each.
// file_server.oak
package!("std");
use_rule!("std");
let main() -> i32
{
0
}
// file_client.oak
package!("std");
use_rule!("std");
let main() -> i32
{
0
}
We want our file server to take in a request for a given file
in the form of a regular expression, and send back the first
file which matches it. Thus, it will need the string
,
file_inter
, regex_inter
, and sock_inter
Oak
files, all
of which are found in the std
package (although not included
by default).
The client program will only need to make requests and save
files, so it will only need string
, file
, and sock_inter
.
Additionally, we will include std/printf.oak
in both files for
convenient printing. Now, the skeleton files should look like
the following.
// file_server.oak
package!("std");
use_rule!("std");
include "std/string.oak",
"std/file_inter.oak",
"extra/regex_inter.oak",
"std/sock_inter.oak",
"std/printf.oak";
let main() -> i32
{
0
}
// file_client.oak
package!("std");
use_rule!("std");
include "std/string.oak",
"std/file_inter.oak",
"std/sock_inter.oak",
"std/printf.oak";
let main() -> i32
{
0
}
Now, let's focus on the server side of things. When not connected to a client, the server should listen for one. When one arrives, it should wait for a command from it. Upon hearing a command from the client, it should search for a file matching the given regular expression. If not found, it should send back an error message. Otherwise, it should send the file.
When sending a file, we will first have the server send the size
of the file. The client will echo back this size to indicate
that it has received it. Then, the server will send the file.
The Oak
socket interface is always TCP, so the file will be
broken apart automatically. Requests will work in the same way-
size, then message.
An error opening the file will be indicated by sending a size of zero.
The above outline roughly translates to the code below.
// file_server.oak
package!("std");
use_rule!("std");
include!("std/string.oak",
"std/file_inter.oak",
"extra/regex_inter.oak",
"std/sock_inter.oak",
"std/printf.oak",
"std/conv_extra.oak");
// Sizes will be sent using this many chars
let size_chars! = to_u128(8);
// Handle a single request
let do_request(server: ^server_sock, request: string) -> void;
let main() -> i32
{
let is_listening = true;
// Create empty server socket
let server: server_sock;
// Initialize server socket to localhost on port 1234
// Save result of initialization into result
let result = server = ("127.0.0.1", to_u16(1234));
let size_str: string;
let size: u128;
let request: string;
while is_listening
{
// Listen for a connection
// save connection status in result
result = server.listen();
size_str = server.recv(size_chars!);
size = to_u128(to_i128(size_str));
if size != to_u128(0)
{
request = server.recv(size);
if request == "quit"
{
is_listening = false;
}
else
{
do_request(server, request);
}
}
else
{
is_listening = false;
}
}
0
}
Great! Now we've got the basic outline of our server. However,
we still need to define do_request
. A request should be a
regular expression, and our do_request
function should send
the first file it finds matching this description over the
server socket. Once we have the exact filename to send, this
won't be so hard. However, how do we find the first filename
that matches the given regular expression? Let's do this the
"hacky" way, using the std/sys_inter.oak
file.
We will tell the system to save a list of all local files to a
temporary .txt
file, then read that file in. From this, we can
iterate through until we find a filename which matches the given
regular expression. Then, we will load the matched file into
memory and send it over the socket. After this, we must remember
to delete our temporary .txt
file. Let's create a skeleton of
our do_request
function.
// file_server.oak
// (code above hidden)
// This line will be integrated into the earlier include! block
include!("std/sys_inter.oak");
// Handle a single request
let do_request(server: ^server_sock, request: string) -> void
{
// Make system call to list all files in this directory
// Open file where the above output was saved
// Iterate over lines in this file
// If the current line matches request
// Load this file
// Send this file
// Break
// If no matches were found
// Send error signal
// Delete temporary text file
}
// (code below hidden)
The Linux
command we will be using to save a list of all local
files to a temporary text file is ls > temp.txt
. After this,
temp.txt
will contain all the files in the current directory.
// file_server.oak
// (code above hidden)
include!("std/panic_inter.oak");
include!("std/bool.oak");
use_rule!("bool");
// Handle a single request
// Assumes the server is running on Linux
let do_request(server: ^server_sock, request: string) -> void
{
// Make system call to list all files in this directory
let result = sys("ls > temp.txt");
assert!(result == 0);
let pattern: regex = request;
// Open file where the above output was saved
let file: i_file;
file.open("temp.txt");
let line: string;
let found: bool = false;
// Iterate over lines in this file
while !file.eof() && !found
{
// Read a line of at most 256 chars
line = file.getline(256);
// If the current line matches request
if (line.regex_match(pattern))
{
// Load this file
let to_send: i_file;
to_send.open(line);
let contents: string;
contents = to_send.read(to_send.size());
// Send this file
server.send();
server.send(contents);
to_send.close();
// Break
found = true;
}
}
file.close();
// If no matches were found
if !found
{
// Send error signal
}
// Delete temporary text file
sys("rm temp.txt");
}
// (code below hidden)
acorn
and the Oak
are free and open source software. They
are available with the GPLv3 at github.com/jorbDehmel/oak.
The following table maps a compile-mode acorn call return
value to its meaning. If acorn
executes its result via the
-E
flag, it will return the return value of that execution, no
matter what.
Code | Meaning |
---|---|
0 | Success |
2 | Runtime error caught |
3 | Unknown error caught |
7 | Failed to create test suite file |
10 | Aborted Acorn update or install |
Most compile-time errors will have meaningful descriptions in their outputs, and thus do not return different values.
Error Class | Meaning |
---|---|
runtime_error | Miscellaneous error not attributable below |
sequencing_error | Syntax error within symbol tree construction |
generic_error | Error instantiating or entering a template |
package_error | Error loading a package |
rule_error | Error processing rule |
parse_error | Error parsing token stream (depreciated) |
unknown | An unknown bug- issue w/ acorn |
Note: Unknown errors represent bugs in acorn
, and should
be reported as such. Errors without reasons are unacceptable.
The Oak
programming language outlined here bears no relation
nor resemblance to any other programming languages, frameworks
or prototypes of the same name. Unique names are increasingly
difficult to find for programming languages. The Oak
referenced here was developed independently from any
similarly-names items.
Oak
is protected by the GPLv3.