/Elephant

Advanced struct/array serialization for GameMaker Studio 2.3

Primary LanguageGame Maker LanguageMIT LicenseMIT

Elephant 1.3.0

Advanced struct/array serialization for GameMaker Studio 2.3

@jujuadams

Chat about Elephant on the Discord server

 

 

 

Introduction

Elephant is a struct/array serialization system that offers extended functionality beyond the typical JSON functions:

  • Serialization of arrays, structs, and scalar datatypes
  • Circular references are stored and recreated correctly
  • Structs made with constructors are recreated using the constructor
  • Constructed structs can have schemas to control which variables are serialized and how
  • Constructed structs can have read/write callbacks

 

When using Elephant, some considerations must be taken into account:

  • Constructors must be in global scope i.e. in a script
  • Whilst static methods in constructors will persist, non-static methods cannot be serialized
  • Constructor schemas are shallow without nesting/recursion, and arrays cannot have schemas
  • Upon deserialization, structs are rebuilt by new-ing the constructor with zero arguments
  • Arrays are limited to 65534 elements and structs are limited to 65533 member variables

 

Arrays are assumed to have flexible typing, though arrays that are found to have a consistent datatype throughout are optimised automatically when serializing. Preferably, constructors should only set default variable values and structs shouldn't alter state outside of their scope on instantiation.

N.B. When using Elephant it is very important to ensure constructor methods are static. A non-static method cannot be serialized and will instead be set to undefined upon deserialization.

 

Elephant introduces a handful of macros that are useful for interacting with the library. These are explained in further detail later in the document.

Schema definition for constructors:

  • ELEPHANT_SCHEMA
  • ELEPHANT_FORCE_VERSION
  • ELEPHANT_VERSION_VERBOSE
  • ELEPHANT_VERBOSE_EXCLUDE

Custom datatypes that can be used with Elephant schemas:

  • buffer_any
  • buffer_array
  • buffer_struct
  • buffer_undefined

Callbacks, and callback state:

  • ELEPHANT_PRE_WRITE_METHOD
  • ELEPHANT_POST_WRITE_METHOD
  • ELEPHANT_PRE_READ_METHOD
  • ELEPHANT_POST_READ_METHOD
  • ELEPHANT_SCHEMA_VERSION
  • ELEPHANT_IS_DESERIALIZING

 

 

 

Functions

Elephant has five public functions that can be used:

  • ElephantWrite(target, [buffer])

    • Serializes the given target data and writes it to the given buffer, starting at the buffer_tell() position. This function uses buffer_write() and will move the buffer head as it writes. If no buffer is provided then a new buffer is created that fits the serialized data. This function calls ELEPHANT_PRE_WRITE_METHOD and ELEPHANT_POST_WRITE_METHOD for constructed structs, and ELEPHANT_IS_DESERIALIZING is set to false. ELEPHANT_SCHEMA_VERSION will contain the constructor schema version that Elephant is using to serialize data.
  • ElephantExportString(target)

    • As above, but returns a base64 encoded version of the buffer. This function also performs compression on the buffer.
  • ElephantRead(buffer)

    • Deserializes Elephant data from a buffer, starting at the buffer_tell() point. This function uses buffer_read() and will move the buffer head as it reads data. This function calls ELEPHANT_PRE_READ_METHOD and ELEPHANT_POST_READ_METHOD for constructed structs, and ELEPHANT_IS_DESERIALIZING is set to true. ELEPHANT_SCHEMA_VERSION will contain the constructor schema version that Elephant found in the source data.
  • ElephantImportString(string)

    • As above, but takes a string rather than a buffer. This string should have been created by ElephantExportString().
  • ElephantDuplicate(target)

    • Makes an identical copy of the target. Unlike ElephantWrite(), this function ignores schemas and will copy all member variables and non-static methods. This function will recreate constructed structs appropriately and will also correctly duplicate circular references.

 

 

 

Schemas

Schemas may be defined for constructors by using the macro ELEPHANT_SCHEMA to define a struct literal. This struct literal contains schema versions as the top-level keys, and member variables names with associated datatype as second-level keys.

If no schema is defined then all member variables for the struct will be serialized using the generic buffer_any datatype. This typically leads to large buffers and is much slower to both serialize and deserialize and should generally be avoided. Try to declare a schema whenever you can.

Schemas must be defined by setting ELEPHANT_SCHEMA in a constructor e.g.

function Example() constructor
{
	x = 0;
	y = 0;
	
	ELEPHANT_SCHEMA
	{
		v1 : {
			x : buffer_f64,
			y : buffer_f64,
		},
	}
	
	static SetPosition = function(_x, _y)
	{
		x = _x;
		y = _y;
	}
}

Top-level keys in a struct delineate schema versions. Versioning is critical for writing robust code that will work as your project develops and changes. Schema versions must start with a lowercase v and must be followed by a positive integer from 1 to 127 inclusive.

N.B. It is very important that you do not ever remove schema versions! If you remove a schema version then any old files that use the old schema version cannot be recovered, which is very likely to break your project.

Variables defined in a schema can take any of the following datatypes, partially shared with GameMaker's native constants that are used for buffer access.

N.B. Elephant does no type checking for scalar values in the interests of speed. Please ensure that the value you're serializing matches the datatype in the schema.

Value Name Description
1 buffer_u8 Unsigned 8-bit integer, a positive value from 0 to 255
2 buffer_s8 Signed 8-bit integer, a positive or negative value from -128 to 127
3 buffer_u16 Unsigned 16-bit integer, a positive value from 0 - 65535
4 buffer_s16 Signed 16-bit integer, a positive or negative value from -32,768 to 32,767
5 buffer_u32 Unsigned 32-bit integer, a positive value from 0 to 4,294,967,295
6 buffer_s32 Signed 32-bit integer, a positive or negative value from -2,147,483,648 to 2,147,483,647
7 buffer_f16 16-bit float
8 buffer_f32 32-bit float
9 buffer_f64 64-bit float
10 buffer_bool Boolean value, can only be 0 or 1
11 buffer_string String of any size, with a null terminator
12 buffer_u64 An unsigned 64-bit integer
13 buffer_text String of any size, with a null terminator (there is no difference between buffer_text and buffer_string)
14 buffer_any Datatype can be any serializable data. This is the default when serializing content in arrays or structs that have no schema
15 buffer_array Data is an array. Array elements themselves can be any datatype, though Elephant will optimise arrays with a consistent datatype. Arrays are limited to 65534 elements
16 buffer_struct Data is a struct, either anonymous or created by a constructor. Structs are limited to 65533 member variables
17 buffer_undefined Undefined value, using GameMaker's datatype. This is equivalent to null in JavaScript

 

 

 

Schema Extensions

Whilst Elephant will default to choosing the latest version number for serialization, the schema version to be used can be forced by setting ELEPHANT_FORCE_VERSION in the base ELEPHANT_SCHEMA struct e.g.

function Example() constructor
{
	x = 0;
	y = 0;
	
	ELEPHANT_SCHEMA
	{
		ELEPHANT_FORCE_VERSION : 1, //Force Elephant to use schema v1 rather than v2
		
		v1 : {
			x : buffer_f64,
			y : buffer_f64,
		},
		
		v2 : {
			x : buffer_f32,
			y : buffer_f32,
		},
	}
	
	static SetPosition = function(_x, _y)
	{
		x = _x;
		y = _y;
	}
}

One of the main advantages of using schemas is that filesizes can be reduced, and performance increased, by storing variables without contextual information in the outputted binary data (context is instead infered by reading the schema). The trade-off is that once a schema is set up variables name and datatype cannot change.

During the early development phase of your game, it's likely that the filesize and performance advantages of strict schemas are not preferable and you'd instead like to store data more loosely. By setting ELEPHANT_VERSION_VERBOSE to true in a schema definition, Elephant will instead store variables with all contextual data so that it can be more reliably read upon deserialization.

function Example() constructor
{
	x = 0;
	y = 0;
	
	ELEPHANT_SCHEMA
	{
		v1 : {
			ELEPHANT_VERSION_VERBOSE : true, //Store data with 1) its datatype and 2) the variable name
			x : buffer_f64,
			y : buffer_f64,
		},
	}
	
	static SetPosition = function(_x, _y)
	{
		x = _x;
		y = _y;
	}
}

For quick development, it's useful to not use schemas at all and instead specify what you don't want to save. Defining ELEPHANT_VERBOSE_EXCLUDE as an array that contains unwanted variable names (as strings) will instruct Elephant to ignore those names when saving without a schema, or when a schema version is set to verbose (see ELEPHANT_VERSION_VERBOSE above).

function Example() constructor
{
	startHP = 10;
	hp = startHP;
	
	ELEPHANT_SCHEMA
	{
		ELEPHANT_VERBOSE_EXCLUDE : ["startHP"], //Don't serialize the starting HP
	}
	
	static Damage = function(_damage)
	{
		hp -= _damage;
	}
}

 

 

 

Callbacks

Elephant allows for the definition of callback methods per constructor. These are executed as follows:

Method Macro Timing
ELEPHANT_PRE_WRITE_METHOD Executed immediately before serialization
ELEPHANT_POST_WRITE_METHOD Executed immediately after serialization
ELEPHANT_PRE_READ_METHOD Executed immediately before deserialization
ELEPHANT_POST_READ_METHOD Executed immediately after deserialization

During the execution of callbacks, two macros can be read: ELEPHANT_SCHEMA_VERSION and ELEPHANT_IS_DESERIALIZING. ELEPHANT_SCHEMA_VERSION contains the schema version that is being used, whereas ELEPHANT_IS_DESERIALIZING will be either true or false. Both variables are set to undefined outside of serialization/deserialization.

function Example() constructor
{
	x = 0;
	y = 0;
	
	//Distance to the centre of the room
	distance = point_distance(x, y, room_width/2, room_height/2);
	
	ELEPHANT_SCHEMA
	{
		v1 : {
			x : buffer_f64,
			y : buffer_f64,
			distance : buffer_f64,
		},
		v2 : {
			x : buffer_f64,
			y : buffer_f64,
		}
	}
	
	ELEPHANT_POST_READ_METHOD
	{
		//After deserializing the struct, update the distance to the centre of the room
		//We only need to run this code for v2 schemas because v1 serializes distance
		if (ELEPHANT_SCHEMA_VERSION == 2)
		{
			distance = point_distance(x, y, room_width/2, room_height/2);
		}
	}
	
	static SetPosition = function(_x, _y)
	{
		x = _x;
		y = _y;
		distance = point_distance(x, y, room_width/2, room_height/2);
	}
}

 

 

 

Binary Format

Elephant uses a custom binary format to encode data, the details of which are described below. There are two key concepts that allow Elephant to handle circular references and constructors.

Elephant serializes/deserializes circular references by associating a unique integer ID with every struct and array that gets created. Structs and arrays share the same "pool" of IDs such that no struct and array can ever share the same ID. IDs start at 0 for the first struct/array that is seen and increases by 1 for each additional struct/array. When a struct or array is deserialized, this unique integer ID can then be used to rebuild circular references.

Constructor indexes work in a similar way. Each constructor is given an ID when it is first seen. If a later struct uses the same constructor then the constructor index can be translated into the correct constructor function without having to repeat the construcor name for every struct.

 

Wrapper

Datatype Name Description
buffer_u32 header 0x454C4550 a.k.a. UTF-8/ASCII string ELEP. If this is missing then the data is invalid
buffer_u32 version The version number of Elephant used to create the data. This is calculated by ((majorVersion << 16) + (minorVersion << 8) + patchVersion)
buffer_any content The root value
buffer_u32 footer 0x48414E54 a.k.a. UTF-8/ASCII string HANT. If this is missing then the data is invalid

 

buffer_any

Datatype Name Description
buffer_u8 datatype Indicates the datatype of content to follow. Matches the list of constants laid out above (buffer_array, buffer_u8, buffer_string etc.)
Varies content Content that this datapoint describes. For scalar data, this is the value itself stored using the datatype

 

Scalar datatype (buffer_string, buffer_f32, buffer_u8 etc.):

Datatype Name Description
Varies value The value itself stored using the datatype

 

buffer_array

Datatype Name Description
buffer_u16 length Number of elements in the array. If this value is 0 then no datatype nor content follows. If the length is 65535 (0xFFFF) then special behaviour should be executed, see below
buffer_u8 datatype Datatype to use to deserialize following data. This can be any of the constants laid out above, including buffer_any
As above value 0 Value for the 0th element
etc.

 

buffer_array circular reference, length = 65535 (0xFFFF)

Datatype Name Description
buffer_u16 length 0xFFFF. This indicates that the struct/array has already been seen before and that this struct/array reference should be duplicated
buffer_u16 reference index Index of the struct/array to use

 

buffer_struct

Datatype Name Description
buffer_u16 length Number of member variables for this struct. If this value is 0 then no key/value pairs follow. If the length is 65535 or 65534 (0xFFFF or 0xFFFE) then special behaviour should be executed, see below
buffer_string variable name 0 Name of the 0th member variable as a null-terminated string
buffer_any value 0 The value of the 0th member variable
etc.

 

buffer_struct circular reference, length = 65535 (0xFFFF)

Datatype Name Description
buffer_u16 length 0xFFFF. This indicates that the struct/array has already been seen before and that this struct/array reference should be duplicated
buffer_u16 reference index Index of the struct/array to use

 

buffer_struct constructor with schema, length = 65534 (0xFFFE)

Datatype Name Description
buffer_u16 length 0xFFFE. This indicates that the struct was instantiated using a constructor
buffer_u16 constructor index Index of the constructor that was used to create the struct
(buffer_string) (constructor name) (If the constructor index is new then the name of the constructor function follows as a string)
buffer_u8 version & verbose The schema version that was used to serialize the content that follows. The most significant bit determines whether the struct was serialized in verbose mode. This byte should always be greater than 0
Varies value 0 Value for the 0th member variable, the name and datatype of which is determined by the schema
etc.

 

buffer_struct verbose constructor, length = 65534 (0xFFFE)

Datatype Name Description
buffer_u16 length 0xFFFE. This indicates that the struct was instantiated using a constructor
buffer_u16 constructor index Index of the constructor that was used to create the struct
(buffer_string) (constructor name) (If the constructor index is new then the name of the constructor function follows as a string)
buffer_u8 version & verbose 0x80. This indicates that variable data was serialized verbosely and without a schema
buffer_string variable name 0 Name of the 0th member variable as a null-terminated string
buffer_any value 0 Value for the 0th element
etc.