/LITE

LISP INTERPRETER and TEXT EDITOR

Primary LanguageCMIT LicenseMIT

LISP INTERPRETER AND TEXT EDITOR

LITE

LITE is an extensible text editor with a built-in, dynamic programming language to alter the editor’s functionality: LITE LISP.

LITE may mean anything that abbreviates the letters; listed here are a few that are especially significant.

  • LISP INTERPRETER and TEXT EDITOR
  • LITE IS TOTALLY EMACS
  • LISP IMBUED with TONS of ECCENTRICITIES

The goal of LITE is to create a modern, cross-platform tool for text editing and general scripting that is easy to port.

Initially, development followed this LISP interpreter tutorial (archive).

One glaring difference between Emacs and LITE is that LITE does not use a Gap Buffer data structure to edit text; rather, it uses a Rope.

NOTE: Any and all shell commands assume a working directory of the base of this repository.

Usage

If you have the LITE executable, run it with the --help argument for usage information.

LITE GFX

By default, LITE is compiled with a GFX backend, resulting in an executable GUI application.

Run it the way that you would normally run a graphical application on your OS. Most of the time this means double-clicking the executable from a file explorer, or launching the executable from a shell.

NOTE: For ongoing development, it is best to run builds of LITE with a working directory of this repository, so it can find the most up-to-date LISP sources, fonts, etc.

The executable may be obtained by building from local source.

LITE REPL

LITE may be built as a terminal-only “read, evaluate, print, loop” program (a REPL).

To enter the REPL, run the following:

./bin/LITE

On Windows:

.\bin\LITE

Any and all arguments following a -- argument are treated as file paths and are attempted to be loaded as LITE LISP source files.

Building

CMake is used as the cross-platform build system.

To see a list of available build systems, use the following command:

cmake -G

I recommend Ninja, it is modern, fast, and designed for use in conjunction with CMake.

First, generate an out-of-source build tree:

cmake -G <build system> -B bld -DCMAKE_BUILD_TYPE=Release

Replace <build system> with one of the available build systems

Optionally, build LITE with graphical user interface capabilities by adding the flag -DLITE_GFX=ON when generating the build tree. Ensure to read the README in the ~gfx~ subdirectory, if doing so.

Once the build tree is generated, invoke it to generate an executable.

cmake --build bld

If no fatal errors occur, the bin subdirectory of the repository will be populated with the LITE executable.

In order for LITE to find the standard library at runtime, it is necessary to install LITE. This creates a directory in which LITE will look during runtime for certain resources like fonts or lisp. On Linux, this is located at $HOME/lite. On Windows, it’s at %APPDATA%/lite.

cmake --install bld

LITE LISP

LITE LISP is the language that LITE interprets.

Basic Syntax

Every LISP program is made up of atoms. An atom is nearly any sequence of bytes, except for whitespace, commas, or backslashes. Atoms are NOT case sensitive.

Here are some valid LITE LISP atoms.

foo-bar
foo/bar
*foobar*
<*-/im-an-atom/-*>
-420
69
interrupt80

A list is a sequence of atoms and/or other lists surrounded by parentheses.

(foo-bar foo/bar *foobar*)
(   im a    list (in a list ))
(sun mon tue wed thu fri sat)

Strings are any sequence of bytes surrounded by double quotes.

"I am a string."
" al;sdn  (!*^(*%)#!^)  \033  \n  lasnk  \ a     \a   \t \f "
"Information is easy to fake but hard to smell."

Comments begin with a semi-colon and stop at the first newline.

; I'm a comment
;;;; And I as well!

(print "hello, friends!") ; print to stdout

Function calls are represented as a list with a symbol as the first element, and any arguments passed are subsequent elements.

(print "hello friends!")
(abs -69420)
(define foo 42)

The first element in a list that is to be evaluated is referred to as the operator.

Atoms

Every object in LISP is called an Atom. Every Atom has a type, a value, a docstring, and a generic allocation pointer associated with it.

The value is a union with multiple value types, and the type field designates which value within the union to use, and how to treat it.

The docstring is a string containing information about the atom, i.e. documenting it.
This could range from a function’s usage to a variables meaning. \ Access docstrings using the docstring special form: (docstring <atom>).

The generic allocation pointer is a linked list of allocated memory that may be freed when the atom is garbage collected. This allows the LITE interpreter to allocate memory as needed and ensure it is freed /after/ using it.

Types

Here are the different types an Atom may have in LITE LISP:

Nil
This is the definition of false, nothing, etc.
Pair
A recursive pair, containing a left-hand Atom and a right-hand Atom.

A pair has special terminology for the two sides; the left is referred to as car, while the right is referred to as cdr.

A list is a pair with a value on the left, and another pair, or nil, on the right.

Symbol
A sequence of bytes that may be bound in the environment.

All symbols are located in the symbol table with no duplicates.

String
A sequence of bytes, usually denoting human readable text.
Integer
An integer number, like 1, -420, or 69.
BuiltIn
A function implemented in LITE source code that is able to be called from LITE LISP.
Closure
A function implemented in LITE LISP; a lambda.
Macro
A closure with unevaluated arguments that creates an expression that is then evaluated.
Buffer
An opened file that may be edited in LITE.

Environment, Variables, and QUOTE

Variables are stored in an environment. An environment is a key/value dictionary, where the keys are a symbol, and the values are atomic LISP objects. When evaluating a symbol, it is first checked if there is a binding in any accessible environment. If so, that value is used in place of the symbol, when evaluated.

To bind a symbol to a value in the local environment, use the DEFINE special form.

(define new-variable 42)

NOTE: DEFINE will first attempt to find the symbol in any parent environment; if found, it will override that binding’s value instead of creating a new one in the immediate environment. This allows for DEFINE to set the value of parameters, LET arguments, etc.

To bind a symbol to a value in the global environment, use the SET special form.

(set new-variable 42)

new-variable is now a symbol bound in the environment. Following occurences of the bound symbol will be evaluated to the defined value, 42.

Sometimes, it is useful to not evaluate a variable. This can be done using the QUOTE operator.

(quote new-variable) ; returns the symbol "new-variable"

As quoting is a very common necessity in LISP, there is a special short-hand for it: a preceding single-quote. This short-hand means the following to be equivalent to the QUOTE just above.

'new-variable ; returns the symbol "new-variable"

When defining any variable, it is possible to define a docstring for it by specifying it as a third argument:

(define new-variable 42 "The meaning of life, the universe, and everything.")

The docstring may be accessed with a builtin, like so:

(docstring new-variable)

The standard library includes a macro to help re-define a docstring:

(set-docstring new-variable "The meaning of your mom.")

This allows for everything in LITE LISP to self-document it’s use.

Functions

The standard library includes the DEFUN macro to help define named functions.

(defun NAME ARGUMENT DOCSTRING BODY-EXPRESSION(S))

Here is a simple factorial implementation that works for small, positive numbers:

(defun fact (x) "Get the factorial of integer X." (if (= x 0) 1 (* x (fact (- x 1)))))

To call a named function, put the name of the function in the operator position, and any arguments following. Arguments are evaluated before being bound and the body being executed.

(fact 6)

Assuming FACT refers to the function defined just above, this would result in the integer 720, as 6 was bound to the symbol X during the execution of the functions body.

As arguments are evaluated before being bound, we can also pass expressions. The result of the expression will be bound to the argument symbol.

(fact (fact 3))

In this case, (fact 3) will be evaluated before the outer FACT call, so that we can bind the result of it to X. Once evaluating, we will get the integer result 6, which will then be bound to X in the outer (left-most) FACT call, resulting in 720.

Lambda/Closure

A lambda is a function with no name.

Currently, lambdas may be defined with the following special form:

(lambda ARGUMENT BODY-EXPRESSION(S))

ARGUMENT is a symbol or a list of symbols denoting arguments to be bound when the function is called.

BODY-EXPRESSION(S) is a sequence of expressions that will be executed with arguments bound when the lambda is called. The result of the last expression in the body is the return value of the lambda.

This means the identity lambda may be written like so:

(lambda (x) x)

As a real world example, here is the factorial implementation from above written as a lambda:

(lambda (x) (if (= x 0) 1 (* x (fact (- x 1)))))

To call a lambda, put it in the operator position just like the name of a named function. Pass any arguments as subsequent values in the list, just as you would a named function.

((lambda (x) (if (= x 0) 1 (* x (fact - x 1)))) 6)

Evaluating the above would result in the integer value 720, as 6 was bound to X and the lambda body was executed.

Variadic Arguments

There is also support for variadic arguments using an improper list. The syntax for an improper list is as follows:

(1 2 3 . 4)

In the context of a lambda, here is how to define a function with two positional arguments followed by a varying number of arguments.

(lambda (argument1 argument2 . the-rest) BODY-EXPRESSION(S))

After all fixed arguments are given, the rest are passed as a list to the function. If no variadic arguments are given, nil is passed.

To create a function that may take any amount of arguments, put a symbol in the ARGUMENT position, as seen in this re-definition of the + operator in the standard library:

(let ((old+ +))
  (lambda ints (foldl old+ 0 ints)))

Macros

A macro may be created with the MACRO operator. A macro is like a lambda, except it will return the result of evaluating it’s return value, rather than it’s return value being the result. This allows for commands and arguments to be built programatically in LISP.

In order to ease the making of macros, there is quasiquotation. It is similar to regular quotation, but it is possible to unquote specific atoms so as to evaluate them before calling the returned expression.

While it is possible to call the quasiquotation operators manually, there are short-hand special forms built in to the parser.

  • ’`’ – QUASIQUOTE
  • ’,’ – UNQUOTE
  • ’,@’ – UNQUOTE-SPLICING

These special forms allow macro definitions to look more like the expressions they produce.

A simple example that mimics the QUOTE operator:

(macro my-quote (x) "Mimics the 'QUOTE' operator." `(quote ,x))

The QUASIQUOTE special-form at the beginning will cause the QUOTE symbol to pass through without being evaluated. The UNQUOTE special-form before the X symbol will cause it to be evaluated, replacing ,x with the passed argument.

For example, calling (my-quote a) will eventually expand to (QUOTE A), which will result in the symbol A being returned upon evaluation.

For a more real-world example that is actually useful, let’s take a look at DEFUN from the standard library.

(macro defun (name args docstring . body)
  "Define a named lambda function with a given docstring."
  `(define ,name (lambda ,args ,@body) ,docstring))

As you can see, this macro takes 3 fixed arguments followed by any number of arguments following passed as a list bound to BODY. The first argument, name, is within a quasiquoted expression, but contains an unquote special-form operator. This causes it to be evaluated during macro expansion, resulting in the passed argument. The same thing happens with ARGS and DOCSTRING. When it comes to BODY, though, things change. As BODY is a list, and a function body is not a list, but a sequence, we must transform it somehow. This is where the UNQUOTE-SPLICING operator comes into play, as it will take each element of a given list and splice it into a sequence.

,BODY  = ((print a) (print b) (print c))
,@BODY = (print a) (print b) (print c)

This allows the LAMBDA body argument to be a valid sequence of expressions that can be evaluated properly.

When including the standard library, DEFMACRO operates exactly the same as MACRO.

When the environment variable DEBUG/MACRO is non-nil, extra output concerning macros is produced.

Special Forms

Special forms are hard-coded symbols that go in the operator position. They are the most fundamental building blocks of how LITE LISP operates.

Here is a list of all of the special forms currently in LITE LISP.

QUOTE
Pass one and only argument through without evaluating it.

There is also a short-form built in to the parser: ' (single quote). This allows code to be written much faster, as quoting is something that happens quite often in the land of LISP.

'X == (QUOTE X)
    
DEFINE and SET
Bind a symbol to a given atomic value within the LISP environment.

(DEFINE SYMBOL VALUE [DOCSTRING])

(SET SYMBOL VALUE [DOCSTRING])

DEFINE first checks all parent environments for a binding of the symbol, and will override that one if it finds it. If SYMBOL is not bound in any parent environment, DEFINE binds it in the local environment. That is, the environment DEFINE was called from.

SET only operates on the global environment. This environment is the top level environment that is carried between evaluations, whereas local environments tend to go away after evaluation completes.

LAMBDA
Create a closure from the given expected arguments and body.

(LAMBDA ARGS BODY)

This is an expression which returns a closure. A closure is just like a function, except that it retains a pointer to the environment that it was created within, allowing any variable accesses to be resolved as expected.

This closure can be placed directly in the operator position and called. Any arguments following the operator position are arguments to the given operator. An error will be reported if the number of arguments does not match, unless making use of an improper list to gather all remaining arguments into one.

((identity (x) x) 42)

IF
A conditional expression.

(IF CONDITION THEN OTHERWISE)

Evaluate the given condition. If result is non-nil, evaluate the second argument given. Otherwise, evaluate the third argument.

WHILE
A conditional loop.

(WHILE CONDITION BODY)

Evaluate condition. If result is non-nil, evaluate BODY one time. Repeat each time body is evaluated.

Extra information regarding WHILE loops is output when the DEBUG/WHILE debug flag is set to a non-nil value.

PROGN
Evaluate sequence of expressions, returning result of last expression.

This is mainly used within IF to be able to evaluate multiple expressions within the THEN or OTHERWISE singular expression argument.

MACRO
Create a closure, except the passed arguments are not evaluated, and the value returned from the macro is evaluated, then that return value is the result.

(MACRO SYMBOL ARGS DOCSTRING BODY)

One of the most useful features of macros is quasiquoting, which is just a fancy word meaning evaluating only some arguments while passing others through quoted. See the section on macros for more details.

EVALUATE
Return the result of the given argument after evaluating it.

This is mostly used in macros to evaluate certain arguments more than once.

ENV
Return the current environment.

The first element (car (env)) is the parent environment. If the parent environment is nil, that indicates the environment is the global environment.

NOTE: This is a copy of the environment. Changes to it are not reflected in the environment itself.

ERROR
Print the given message to standard out after an error indicator. Returns the given message. Halts evaluation.
QUIT-COMPLETELY
Exit LITE entirely. Shut down the program.

Return STATUS code from program, or 0 if one isn’t given.

(QUIT-COMPLETELY [STATUS])

Default binding: CTRL-ALT-Q

AND
Return nil as soon as one of the arguments evaluates to nil. Otherwise return T.
OR
Return T as soon as one of the arguments evaluates non-nil. Otherwise return nil.

Structures

Structures are defined in the standard library, and can not be used unless it is included.

In LITE LISP, structures are basically an associative list with stricter rules.

Each association within the structure is referred to as a member.

Each member must be a pair with a symbol on the left side. This symbol is the member’s identifier, or ID.

Let’s look at how to define a new structure:

(defstruct my-struct
  "my docstring"
  ((my-member 0)))

Here, we have a structure, my-struct, with a single member, my-member.

It should be noted that the syntax for defining members matches let exactly, at least on the surface. One important thing to note is that initial values given to members are not evaluated, and so must be a self-evaluating value (a literal). For example, attempting to put the name of a function as an initial value does not work (at least not as expected). The member will be bound to the symbol that matches the name of the function, not the function itself.

To access the value of any given member within a structure, use get-member:

(get-member my-struct my-member)

This will return the value of the member with an ID of my-member within my-struct. If one does not exist, it will return nil. Because we gave the member an initial value of zero, that is what is returned.

set-member can be used to update a member’s value.

(set-member my-struct 'my-member 42)

To define a member to a function, you must first define the structure. Afterwards, use set-member, which evaluates the value argument:

(set-member my-struct 'my-member +)

At this point, my-member of my-struct has a value of the closure which was bound to the symbol +.

We can now call this member function using the call-member macro:

(call-member my-struct my-member 34 35)

Any arguments after the structure symbol and member ID are passed through to the called function.

As you may already be thinking, you don’t always want to use structures in the way shown above, where the actual structure definition is the mutable data. In most cases, it is preferable to define a structure once, and have multiple instances of the defined. This is possible with the make macro:

(defstruct vector3
  "A vector of three integers, X, Y, and Z."
  ((x 0) (y 0) (z 0)))

;; Create an instance of a defined structure.
(set my-coordinates (make vector3))
;; Setting member values.
(set-member my-coordinates 'x 24)
(set-member my-coordinates 'y 34)
(set-member my-coordinates 'z 11)
;; Print the instance of the structure to standard out.
(print my-coordinates)
;; Access all the members of a struct using the `ACCESS` macro.
;; It is like `LET`, except it binds all of a structure's arguments
;; to their values, then evaluates the given body.
(access my-coordinates
        (print x)
        (print y)
        (print z))
;; Accessing member IDs and values as separate lists.
(let ((coordinate-members (map car my-coordinates))
      (coordinate-values (map cadr my-coordinates)))
  (print coordinate-members)
  (print coordinate-values))
;; Print the sum of all of the values in the structure.
(print (foldl + 0 (map cadr my-coordinates)))

Misc

  • Buffer Table

    Get the current buffer table with the BUF operator.

  • Symbol Table

    Get the current symbol table with the SYM operator.

    Alternatively, visualize the environment by setting DEBUG/ENVIRONMENT to any non-nil value.

  • Closure Environment Syntax

    Currently, closures are stored in the environment with the following syntax:

    (ENVIRONMENT (ARGUMENT ...) BODY-EXPRESSION)
        
  • Escape Sequences within Strings

    Currently, strings have a double-backslash escape sequence.

    The following escape sequences are recognized within strings:

    • \\_ -> nothing
    • \\r -> \r (0xd)
    • \\n -> \n (0xa)
    • \\" -> "
  • Debug Environment Variables

    There are environment variables that cause LITE to report output extra information regarding the topic the variable pertains to when non-nil.

    For a list of all debug variables that LITE internally responds to, see the file that enables all of them at once, lisp/dbg.lt.