/mbon

Marked Binary Object Notation

Primary LanguageRust

api-docs

Marked Binary Object Notation

mbon is a binary notation that is inspired by the NBT format.

It is formed of a sequence of strongly typed values. Each made up of two parts: a mark which defines the type and size of the data, followed by the data. Marks can be different in size and so a single byte prefix is used to differenciate between types.

This format is self-describing which means that it is able to know if the data is not formatted correctly or a different type was stored than what was expected. Another feature of the self-describing nature of the format is that you can skip values in the data without the need to parse the complete item, e.g. A 1GB value can be easily skipped by only reading the mark.

Usage

Dumping

You can dump binary data using the dumper::Dumper struct. You can write values directly or use serde's serialize to write more complex data.

use mbon::dumper::Dumper;

let a = 32;
let b = "Hello World";
let c = b'a';

let mut dumper = Dumper::new();
dumper.write_int(a).unwrap();
dumper.write(&b).unwrap();
dumper.write(&c).unwrap();

let output = dumper.writer();
assert_eq!(output, b"i\x00\x00\x00\x20s\x00\x00\x00\x0bHello Worldca");

Parsing

You can parse binary data using the parser::Parser struct. You can parse Value's directly, but it is recommended to use serde to parse data.

use mbon::parser::Parser;
use mbon::data::Value;

let data = b"i\x00\x00\x00\x20s\x00\x00\x00\x0bHello Worldca";

let mut parser = Parser::from(data);

let a = parser.next_value().unwrap();
let b: String = parser.next().unwrap();
let c: u8 = parser.next().unwrap();

if let Value::Int(a) = a {
    assert_eq!(a, 32);
} else {
    panic!("a should have been an int");
}

assert_eq!(b, "Hello World");
assert_eq!(c, b'a');

Embedded Objects

If you are wanting to embed a predefined object inside the format, you can impl object::ObjectDump/object::ObjectParse. Keep in mind that you will need to call write_obj()/parse_obj() to take advantage of it.

use mbon::parser::Parser;
use mbon::dumper::Dumper;
use mbon::error::Error;
use mbon::object::{ObjectDump, ObjectParse};

#[derive(Debug, PartialEq, Eq)]
struct Foo {
    a: i32,
    b: String,
    c: char,
}

impl ObjectDump for Foo {
    type Error = Error;

    fn dump_object(&self) -> Result<Vec<u8>, Self::Error> {
        let mut dumper = Dumper::new();

        dumper.write(&self.a)?;
        dumper.write(&self.b)?;
        dumper.write(&self.c)?;

        Ok(dumper.writer())
    }
}

impl ObjectParse for Foo {
    type Error = Error;

    fn parse_object(object: &[u8]) -> Result<Self, Self::Error> {
        let mut parser = Parser::new(object);

        let a = parser.next()?;
        let b = parser.next()?;
        let c = parser.next()?;

        Ok(Self { a, b, c })
    }
}

let foo = Foo { a: 32, b: "Hello World".to_owned(), c: '🫠' };
let mut dumper = Dumper::new();

dumper.write_obj(&foo).unwrap();

let buf = dumper.writer();
let mut parser = Parser::from(&buf);

let new_foo: Foo = parser.next_obj().unwrap();

assert_eq!(foo, new_foo);

Async Implementations

If you want to parse data asynchronously, you may want to use the provided wrappers: async_wrapper::AsyncDumper, async_wrapper::AsyncParser.

use futures::io::{AsyncWriteExt, Cursor};

use mbon::async_wrapper::{AsyncDumper, AsyncParser};

let writer = Cursor::new(vec![0u8; 5]);
let mut dumper = AsyncDumper::from(writer);

dumper.write(&15u32)?;
dumper.flush().await?;

let mut reader = dumper.writer();
reader.set_position(0);

let mut parser = AsyncParser::from(reader);

let val: u32 = parser.next().await?;

assert_eq!(val, 15);

Grammar

Below is a grammar for the binary format. Note that all numbers are stored in big-endian form.

Value  ::= long | int | short | char | float | double | null | bytes
         | str | object | enum | array | list | dict | map;
Mark   ::= Mlong | Mint | Mshort | Mchar | Mfloat | Mdouble | Mnull
         | Mbytes | Mstr | Mobject | Menum | Marray | Mlist | Mdict | Mmap;
Data   ::= Dlong | Dint | Dshort | Dchar | Dfloat | Ddouble | Dnull
         | Dbytes | Dstr | Dobject | Denum | Darray | Dlist | Ddict | Dmap;

u32 ::= <u8> <u8> <u8> <u8>;
i64 ::= <u8> <u8> <u8> <u8> <u8> <u8> <u8> <u8>;
i32 ::= <u8> <u8> <u8> <u8>;
i16 ::= <u8> <u8>;
i8  ::= <u8>;
f64 ::= <u8> <u8> <u8> <u8> <u8> <u8> <u8> <u8>;
f32 ::= <u8> <u8> <u8> <u8>;


Mlong   ::= 'l';
Mint    ::= 'i';
Mshort  ::= 'h';
Mchar   ::= 'c';
Mfloat  ::= 'f';
Mdouble ::= 'd';
Mnull   ::= 'n';
Mbytes  ::= 'b' u32;
Mstr    ::= 's' u32;
Mobject ::= 'o' u32;
Menum   ::= 'e' Mark#enum;
Marray  ::= 'a' Mark#item u32;
Mlist   ::= 'A' u32;
Mdict   ::= 'm' Mark#key Mark#value u32;
Mmap    ::= 'M' u32;

Dlong   ::= i64;
Dint    ::= i32;
Dshort  ::= i16;
Dchar   ::= i8;
Dfloat  ::= f32;
Ddouble ::= f64;
Dnull   ::= ;
Dbytes  ::= u8 Dbytes |;
Dstr    ::= u8 Dbytes |;
Dobject ::= u8 Dbytes |;
Denum   ::= u32 Data#enum;
Darray  ::= Data#item Darray |;
Dlist   ::= Value Dlist |;
Ddict   ::= Data#key Data#value Ddict |;
Dmap    ::= Value Value Dmap |;

long   ::= Mlong Dlong;
int    ::= Mint Dint;
short  ::= Mshort Dshort;
char   ::= Mchar Dchar;
float  ::= Mfloat Dflaot;
double ::= Mdouble Ddouble;
null   ::= Mnull;
bytes  ::= Mbytes Dbytes;
str    ::= Mstr Dstr;
object ::= Mobject Dbytes;
enum   ::= Menum Denum;
array  ::= Marray Darray;
list   ::= Mlist Dlist;
dict   ::= Mdict Ddict;
map    ::= Mmap Dmap;

X#name means that it is related to any other Y#name, e.g. Mark#item in Marray relates to Data#item.

Specification

Name Description
Long 64 bit integer
Int 32 bit integer
Short 16 bit integer
Char 8 bit integer
Float 32 bit IEEE-754 float
Double 64 bit IEEE-754 float
Null Only the mark
Bytes Unencoded string of bytes
Str UTF-8 encoded string
Object Embeded preformatted data
Enum u32 Variant, embed data
Array len list of item data
List list of values
Dict len list of key-value pairs
Map list of key-value pairs

Numbers

Every number is defined only by their mark. There is no additional data stored in a number's mark.

All numbers are stored in a big endian binary form. Integers are internally considered signed, however, there is no requirement that they need to be signed, so it is possible to read an unsigned integer as a signed integer.

Strings

Strings will store their type marker followed by a u32 for their length e.g. b"s\x00\x00\x00\x05" would indicate a string that is 5 bytes long. Strings must be UTF-8 encoded; If you do not want this behaviour, you can use Bytes which behave in the same way as Str, but without the UTF-8 requirement.

Object

If you wanted to embed binary data, you can use an object value. This is similar to the bytes value, but it uses an unsigned int for the length. It is meant for storing binary data with a predetermined format.

Enum

An enum is meant to be compatible with Rust's enum serialization. It is defined by a variant id followed by an embedded Value. To make the enum self-describing, the mark for the embedded value is placed within the enum's mark.

Null

Null is only uses its mark, there is no data associated with it.

Array

Sequences can be stored in two forms; The Array being more strict than a list. If a sequence cannot be stored as an array, it will be stored as a list.

An array is a sequence of items that all share the same mark. This means that all elements must be the same size: A vector of u32's will always be stored as an array, while a vector of strings can only be stored as an array if each string is the same length.

The Array Mark contains the mark of the contained item followed by then the number of items in a u16.

List

The list is the more lenient way to store sequences. It simply holds a sequence of all the items. The mark of a list simply holds the number of bytes in the list as a u32.

Dict

A dict is similar to an array, but it stores key-value pairs instead. All keys must share the same mark and all values must share the same mark.

The Dict mark contains the key mark, followed by the value mark, and finally the number of items in the dict.

Map

The map, similar to the list stores any value types, but in a key-value format. This is the fallback format if a value cannot be stored as a dict.