Rayson is a serializer for JSON data, which produces very small binary output with minimal overhead. It was designed for serializing data with predefined, known structure. The main idea behind algorithm is to extract values from key-value pairs, encode them and store in topological order in byte array. Later when you pass this binary data to unserializer, you will have to provide empty template with key names, which will be filled with decoded values to reconstruct original JSON. This way data can be stored with only one byte overhead per key!
- optimized for small output - only 1 byte per key overhead (for values/arrays with length below 64 units)
- safe storage format - keeps tree structure with parent-child relations, whole data is checksumed
- flexible decoder - fetches only this entries which you actually need
Let's say you have some personal data of your customers in object literal:
var customers = {
"update": "2013-04-11 07:03:31",
"data": [
{
"age": 34,
"name": "Luz Larson",
"address": "693 Arkansas Drive, Fairforest, North Dakota, 2492"
},
{
"age": 30,
"name": "Hall Jefferson",
"address": "432 Dupont Street, Clarktown, Georgia, 1922"
},
{
"age": 32,
"name": "Camacho Beard",
"address": "367 Cyrus Avenue, Hondah, Michigan, 7013"
},
( ... )
],
"accepted": [ 0, 2 ]
};
As you may have noticed, some of the keys repeating all over the place. When data
array is large, serializing it by simple JSON.stringify or even MessagePack would be an overkill. As long as we know order and structure of data
's descendants we can get rid of its keys with Rayson.
var customers_serialized = rayson.serialize(customers, {
root: rayson.type.str,
age: rayson.type.int8,
accepted: rayson.type.int32
});
rayson.serialize
gives you binary representation of JSON passed as its first argument. Second argument is a list of keys of serialized literal with types of data they hold. Rayson treats JSON as tree structure with root node at the very top of it. When you set type for any internal node, actually you set it for full current subtree (the node with all of its descendants). E.g. setting root
to rayson.type.str
, tells serializer that all entries ( root
is an ancestor for all of them ) in your JSON are unicode strings.
Now customers_serialized
stores binary data of our customers. Take a look how easily we can decode it.
var result = rayson.unserialize(customers_serialized, {
update: rayson.type.str,
customers_data: [{
age: rayson.type.int8,
first_last_name: rayson.type.str
}],
accepted: [ rayson.type.int32 ],
warnings: []
});
To decode customers_serialized
we need to provide template as second argument of rayson.unserialize
. Out template doesn't ideally match structure of initial customers
literal (it's still perfectly valid), to show flexibility of Rayson templating engine.
The result would be:
{
"update": "2013-04-11 07:03:31",
"customers_data": [
{
"age": 34,
"first_last_name": "Luz Larson"
},
{
"age": 30,
"first_last_name": "Hall Jefferson"
},
{
"age": 32,
"first_last_name": "Camacho Beard"
},
(...)
],
"accepted": [ 0, 2 ]
}
Rayson tracks number of entries on each level ( between {}
) of your JSON and allows you to trim some of the records from the end of it (during unserialization process). You may also add some additional entries to template which haven't existed in original data. In practise it means that if only you append new data at the end of levels it guarantees backward compatibility with all previous versions of your templates (probably in both ways).
Arrays in templates always contains only one element which defines template for all of the array's children. Samples:
- multidimensional (also single dimension) array of integers
[ rayson.type.int32 ]
- indie bands each with variable number of albums:
[ { name: rayson.type.str, albums: [ { year: rayson.type.str, name: rayson.type.str } ] } ]
Whenever [ type ]
in template coresponds to internal node in serialized data, the unserializer will treat full subtree as (multi-) nested array of type
elements. When in thrid sample above we replace template with [ rayson.type.str ]
the sample result might be:
[
'Foster the People,
[
[ '2011', 'Torches'],
[ '2014', 'Supermodel']
]
]
To use specific type prepend it with rayson.type
namespace.
bool
- boolean true/false, or any of truthy/falsy values
int32
- 32bit un/signed integer
int8
- 8bit un/signed integer
raw
- array of bytes
str
- standard javascript unicode string
You can add your own data type by implementing simple interface. Basicly type object should implement encode and decode methods. Frist of them takes data for encoding and returns array of bytes, second one should be able to reverse this operation. Take a look at lib/type directory for some working examples.
Rayson's tests can be run in browser with jasmine-html-runner, or in console with jasmine-node script. To run all integration & unit tests in console:
$ jasmine-node ./tests
To run tests in browser first build a realese:
$ grunt release:my-release
Then open tests/runner/runner.html#my-release
, to run all tests.
Chunk | Flag | Uleb128 | Uleb128 octet | Uleb128 octet | value ( when isn't internal ) | ||
---|---|---|---|---|---|---|---|
Type | is internal node | has more octets | size | has more octets | size | (...) | encoded value |
Size | 1bit | 1bit | 6bit | 1bit | 7bit | 8bit | concatenation of sizes from octets |
Tree nodes are stored in topological order, which means that each parent comes before all of its children.
- support for 64bit floats