/language-dlg

A scripting language with parsing and expression evaluation for narratives in games.

Primary LanguageOCamlMIT LicenseMIT

language-dlg

DLG is a scripting language bundled with expression evaluation & code interactivity for narratives in games.

What is it for ?

Building

This project depends on the following modules :

  • menhir
  • dune
  • ANSITerminal
  • dlist
  • lwt
  • lambda-term

You can install them by typing :

$$ opam install menhir dune ANSITerminal dlist lwt lambda-term

You can build the project by going to the root folder and using :

$$ make

DLG language

Syntax

DLG compiler

Compiler optimisations

The compiler yields optimisations that do not change the meaning of the program, but reduce the size of the bytecode (by reducing the size of the AST).

Inside programs

The followings optimisations are done by the compiler in a program :

  • a group of nop instructions will be collapsed into one
  • wait instructions with no event specified and a duration of 0 will be removed

The followings optimisations are done by the compiler in conditions (? expr blocks) :

  • Branches appearing after a wildcard pattern will be removed

    ? Player.Stats.Charisma
      -> n when n > 10
        nop
      -> _
        nop
      -> n when n = 0 ; this branch will be unused, hence removed
        nop
    
  • Unreachable branches will be removed

  • If there is only one branch in a condition corresponding to a wildcard pattern or an all-capturing expression (->n when true), the condition testing will be ignored

Inside expressions

When possible, expressions will be evaluated by the compiler. For instance, the expressions vec2(2*0, 3*4) and 2*(3+3) will be respectively replaced by vec2(0, 12) and 12. Other optimisations include :

  • Boolean operators with one static absorbing argument can be reduced;
    • true||expr and expr||true => true
    • false&&expr and expr&&false => false
  • Trivial conditions will be reduced to the concerned branch (̀true?=a:b => a)
  • Property access will be reduced in trivial cases (vec2(1,-1).[x] => 1.0)
  • Typecasts will be reduced in trivial cases ((int)3.0 => 3)

Type system

In the DLG language, there are two type checking procedures at play when you compile and run a script.

  • static type checking, which takes place at compilation
  • dynamic type checking, which takes place at runtime

Static type checking

In a DLG program, you can have two kinds of variables :

  • script variables that have a scope limited to the DLG interpreter :
    set local var x
    set global var x
    
  • extern variables that are accessing directly objects from your game :
    set extern var x
    

Because of that, the compiler cannot always know for sure that a variable is of a given type. For instance, in the expression f() * 3, the compiler cannot know which type the resulting expression will be, because f is not defined in the context of a DLG script. It follows :

  • a local variable will always have a type, because it must have been declared at some point in the program
  • a global variable will not always have a type, because it can be set in another script. However, if it is set in the current script, the type will be assumed by the compiler.
  • an extern valiable will never have a type, unless declared. If not, the compiler gives it a "no assumption type", and type checking is done at runtime.

In order to assert that a global variable or an extern symbol (a function, a variable, ..) is of a given type, you can use the following instruction to tell the compiler the type of the symbol.

declare extern My.extern.variable type
declare extern My.extern.function type (arg1 arg2 arg3)
declare extern myfunc (arg1 arg2 arg3)

declare global myvariable type

The goal of type checking is to avoid errors that might be hard for the programmer to notice, so the static type checking of the DLG parser uses a little trick :

  • a literal has an actual type that the compiler can recognize for sure
  • a local variable have to be defined before it is used, so the compiler can infer the type of the variable : it will also have an actual type
  • global and external variables cannot have a known type at compilation, so the compiler will give them an expected type, unless they're explicitely declared. Then they will have an actual type.
  • a function invocation will also have an expected type because of its external nature, unless it's explicitely declared. Then it will have an actual type.

This ensures a few things :

  • operations that only use identifiers known by the compiler will be fully typed
  • operations that includes external identifiers can be declared and be typed statically by the compiler; else, they will have a type inferred from their use when possible, and if not misuse won't be caught by the static compiler

Whenever an expression is left unchecked by the static compiler, it will be typed by the dynamic type checking system at runtime to ensure type correctness. The data computed by the static compiler will be transferred to the dynamic type compiler to avoid recomputation.

Type inference

Interpreter

The interpreter is an interface in a given programming language that implements all semantic functions of the language and a bytecode interpreter, which allows a programmer to virtually run the DLG language everywhere. DLG files (*.dlg) are then compiled into DLGP bytecode (*.dlgp).

A DLG interpreter has four big parts :

  • a stack, where data, instructions and commands will be pushed
  • an environment, used to store all the variables and their scope, be they global or local
  • a memory case containing one value that can be memorized (MEM) from the top of the stack or duplicated and pushed on top of the stack (DUPL)
  • a program buffer, which is usually a stream or an array containing the compiled bytecode that can do the following operations :
    • get the next byte (returns a byte) and moves the cursor forward
    • seek a position in a file and set the cursor on it
    • peek the next character from a file; that is, get the next byte without moving the cursor
    • pos returns the posiiton of the cursor in the file
    • check if we're at the eof (end of file)

Bytecode

The opcode is the first byte read from an instruction ; the additional data is a buffer found right after the opcode.

Endianness of additional data is stored in little-endian.

Stack & environment management

name opcode additional data effect category
EOD 0x00 None Indicates the end of some data (usually a string) Special markers
MEM 0x01 None Copies the token from the top of the stack in memory Memory management
DUPL 0x02 None Duplicate the token in memory and push it on the stack Memory management
Deepen Scope 0x03 None Deepens the scope. The variables declared after this statement will be only accessible in this scope level block Stack management
Raise Scope 0x04 None Raises the scope. The variables created at the previous scope level will be destroyed Stack management

Control flow

name opcode additional data effect
Skip if not 0x10 n=int64 (8 bytes) Pulls a token from the stack. If this is a boolean valued at false, skip n bytes of program
Skip 0x11 n=int64 (8 bytes) Skips n bytes of program

Instructions

name opcode additional data effect
Set 0x20 None Pulls an identifier, and a value from the stack, and bind the value to the identifier in the environment
Ifnset 0x21 None Pulls an identifier, and a value from the stack, and bind the value to the identifier in the environment if and only if the variable was never bound
Init 0x22 None Pulls an identifier, and a value from the stack, and bind the value to the identifier in the environment if and only if the variable was never bound. Fail if identifier was already bound
Message 0x23 f=msg_flags (1 byte) Pulls a string value from the stack and display it
Wait 0x24 None
Speed 0x25 None
Invoke 0x26 None
Send 0x27 None
Choice 0x28 None Pulls an integer n from the stack representing the number of choices, pull n string values, and waits for user to push on the stack an integer value 0<=i<n representing the choice taken
Nop 0x80 None Does nothing

When an instruction is reached, the data in the stack will be used as the parameters for it. For instance, with the set instruction :

                  PROGBUF                                  STACK
_______________________________________________   __________________________
[0x90 0x06 0x00 0x00 0x00] 0x62 0x6E 0x00 0x20  ⊢  ∅
[0x62 0x6E 0x00] 0x20                           ⊢  Int(6)
[0x20]                                          ⊢  Id(local x) Int(6)

When 0x20 is reached, two tokens are pulled from the stack. If they are respectively an identifier and an int, they're used as parameters for the set instruction evaluated by the interpreter.

Identifiers

name opcode additional data effect
Extern identifier 0x60 Null-terminated string representing identifier's name An extern identifier
Global identifier 0x61 Null-terminated string representing identifier's name A global identifier
Local identifier 0x62 Null-terminated string representing identifier's name A local identifier

When an identifier is reached, the bytes in the progbuf must be read until a 0x00 is encountered in order to have a string representing its name. For instance, with a local identifier :

                  PROGBUF                                  STACK
_______________________________________________   __________________________
[0x62] 0x6E 0x00                                ⊢  ∅
[0x62] [0x6E 0x00]                              ⊢  ∅
∅                                               ⊢  Id(local {0x68})
∅                                               ⊢  Id(local "x")

Expressions

name opcode additional data effect
Extern variable 0x81 Null-terminated string representing identifier's name Access to an extern (not contained in the environment) variable
Global variable 0x82 Null-terminated string representing identifier's name Access to a global (shared between all scripts) variable
Local variable 0x83 Null-terminated string representing identifier's name Access to a local (accessible only in this script) variable
--- --- --- ---
Int literal 0x90 n=int32 (4 bytes) An int value
Float literal 0x91 f=float (4 bytes, IEEE-754 floating-point) An float value
Bool literal 0x92 0x00 if false, 0xFF if true A boolean value
String literal 0x93 Null-terminated string A string value
Enum literal 0x94 None ?
2D vector literal 0x95 None Pulls two float x,y values from the stack, and build a 2D vector from them
3D vector literal 0x96 None Pulls three float values x,y,z from the stack, and build a 3D vector from them
--- --- --- ---
Inline 0x9F None Pulls a value from the stack, turns it into a string, and pushes the result in the stack
Operator + 0xA0 0x00 Pulls two values from the stack, add them, and pushes the result in the stack
Operator - 0xA0 0x01 Pulls two values from the stack, subtract them, and pushes the result in the stack
Operator * 0xA0 0x02 Pulls two values from the stack, multiply them, and pushes the result in the stack
Operator / 0xA0 0x03 Pulls two values from the stack, divide them, and pushes the result in the stack
Operator && 0xA0 0x04 Pulls two boolean values from the stack, and them, and pushes the result in the stack
Operator || 0xA0 0x05 Pulls two boolean values from the stack, or them, and pushes the result in the stack
Operator == 0xA0 0x06 Pulls two values from the stack, check for their equality, and pushes the boolean result in the stack
Operator != 0xA0 0x07 Pulls two values from the stack, check for their non-equality, and pushes the boolean result in the stack
Operator <= 0xA0 0x08 Pulls two numbers from the stack, compare them, and pushes the boolean result in the stack
Operator >= 0xA0 0x09 Pulls two numbers from the stack, compare them, and pushes the boolean result in the stack
Operator < 0xA0 0x0A Pulls two numbers from the stack, compare them, and pushes the boolean result in the stack
Operator > 0xA0 0x0B Pulls two numbers from the stack, compare them, and pushes the boolean result in the stack
Ternary condition 0xA1 None Pulls a boolean and two values of the same type from the stack, checks the boolean value, pushes the first one on the stack if true, else the second one
Function call 0xA2 None For a function with n arguments, pull an external identifier, n values from the stack, and pushes the result value
Cast 0xA3 1 byte representing the type to cast to (equal to the corresponding literal instruction between 0x90 and 0x96) Pulls a value from the stack, converts it to type, and pushes it on the stack
Access 0xA4 None Pulls a value from the stack, and try to access one of its properties (usually x, y or z in case of vectors)

Flags

There are different flags used in the additional data for some opcodes (for instance, in Message). Here is the list of the flags available.

Message options
name bit value flag value effect
norush 0 0x01 Indicates that the user cannot fast-forward to end of message by pressing a key
noack 1 0x02 Indicates that once displayed, the message will not wait for a keypress in order to read the next instruction

Dynamic type checking

When an operation/instruction pulls data (value of identifier) from the stack, two things can happen :

  • the stack is empty, meaning the program is ill formed. There is a one to one correspondance between RPN (reverse polish notation, which is the form that the bytecode takes) and a syntax tree. If the stack is popped while empty, it means there is a broken branch in the implied AST.

  • the popped token is not of the right type; it means the program is badly typed. An accurate error should be reported.

The static type checker will try to catch all the badly typed programs it can, but since some data can come from external sources (outside of the DLG environment), asserting that the typing is correct cannot be fully asserted at compile-time, and has to be caught at runtime.

To give more information to the static type checker, use declare instructions.