ASDF is a cache oriented string based JSON representation. Besides, it is a convenient Json Library for D that gets out of your way. ASDF is specially geared towards transforming high volumes of JSON dataframes, either to new JSON Objects or to custom data types.
❗️: Currently all ASDF Method names and all UDAs are in DRAFT state, we might want want make them simpler. Please submit an Issue if you have input.
- ASDF is fast. It can be really helpful if you have gigabytes of JSON line separated values.
- ASDF is simple. It uses D's modelling power to make you write less boilerplate code.
- ASDF is tested and used in production for real World JSON generated by millions of web clients (we call it the great fuzzer).
see also github.com/tamediadigital/je a tool for fast extraction of json properties into a csv/tsv.
- define your struct
- call
serializeToJson
( orserializeToJsonPretty
for pretty printing! ) - profit!
import asdf;
struct Simple
{
string name;
ulong level;
}
void main()
{
auto o = Simple("asdf", 42);
string data = `{"name":"asdf","level":42}`;
assert(o.serializeToJson() == data);
assert(data.deserialize!Simple == o);
}
See ASDF API and Specification.
- Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
- Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).
Dub is the D's package manager. You can create a new project with:
dub init <project-name>
Now you need to edit the dub.json
add asdf
as dependency and set its targetType to executable
.
{
...
"dependencies": {
"asdf": "~><current-version>"
},
"targetType": "executable",
"dflags-ldc": ["-mcpu=native"]
}
Now you can create a main file in the source
and run your code with
dub
Flags --build=release
and --compiler=ldmd2
can be added for a performance boost:
dub --build=release --compiler=ldmd2
ldmd2
is a shell on top of LDC (LLVM D Compiler).
"dflags-ldc": ["-mcpu=native"]
allows LDC to optimize ASDF for your CPU.
Instead of using -mcpu=native
, you may specify additional instruction set for a target with -mattr
.
For example, -mattr=+sse4.2
. ASDF has specialized code for
[SSE4.2](https://en.wikipedia.org/wiki/SSE4#SSE4.2 instruction set).
- LDC (LLVM D Compiler) >=
1.1.0-beta2
(recommended compiler). - DMD (reference D compiler) >=
2.072.1
.
uda | function |
---|---|
@serializationKeys("bar_common", "bar") |
tries to read the data from either property. saves it to the first one |
@serializationKeysIn("a", "b") |
tries to read the data from a , then b . last one occuring in the json wins |
@serializationKeyOut("a") |
writes it to a |
@serializationMultiKeysIn(["a", "b", "c"]) |
tries to get the data from a sub object. this has not optimal performance yet if you are using more than 1 serializationMultiKeysIn in an object |
@serializationIgnore |
ignore this property completely |
@serializationIgnoreIn |
don't read this property |
@serializationIgnoreOut |
don't write this property |
@serializationScoped |
Dangerous! non allocating strings. this means data can vanish if the underlying buffer is removed. |
@serializedAs!string |
call to!string |
@serializationTransformIn!fin |
call function fin to transform the data |
@serializationTransformOut!fout |
run function fout on serialization, different notation |
@serializationFlexible |
be flexible on the datatype on reading, e.g. read long's that are wrapped as strings |
@serializationRequired |
Force deserialiser to throw AsdfException if field was not found in the input. |
please also look into the Docs or Unittest for concrete examples!
import std.algorithm;
import std.stdio;
import asdf;
void main()
{
auto target = Asdf("red");
File("input.jsonl")
// Use at least 4096 bytes for real wolrd apps
.byChunk(4096)
// 32 is minimal value for internal buffer. Buffer can be realocated to get more memory.
.parseJsonByLine(4096)
.filter!(object => object
// opIndex accepts array of keys: {"key0": {"key1": { ... {"keyN-1": <value>}... }}}
["colors"]
// iterates over an array
.byElement
// Comparison with ASDF is little bit faster
// then compression with a string.
.canFind(target))
//.canFind("red"))
// Formatting uses internal buffer to reduce system delegate and system function calls
.each!writeln;
}
Single object per line: 4th and 5th lines are broken.
null
{"colors": ["red"]}
{"a":"b", "colors": [4, "red", "string"]}
{"colors":["red"],
"comment" : "this is broken (multiline) object"}
{"colors": "green"}
{"colors": "red"]}}
[]
{"colors":["red"]}
{"a":"b","colors":[4,"red","string"]}
struct S
{
string a;
long b;
private int c; // private feilds are ignored
package int d; // package feilds are ignored
// all other fields in JSON are ignored
}
struct S
{
// ignored
@serializationIgnore int temp;
// can be formatted to json
@serializationIgnoreIn int a;
//can be parsed from json
@serializationIgnoreOut int b;
}
struct S
{
// key is overrided to "aaa"
@serializationKeys("aaa") int a;
// overloads multiple keys for parsing
@serializationKeysIn("b", "_b")
// overloads key for generation
@serializationKeyOut("_b_")
int b;
}
struct DateTimeProxy
{
DateTime datetime;
alias datetime this;
static DateTimeProxy deserialize(Asdf data)
{
string val;
deserializeScopedString(data, val);
return DateTimeProxy(DateTime.fromISOString(val));
}
void serialize(S)(ref S serializer)
{
serializer.putValue(datetime.toISOString);
}
}
//serialize a Doubly Linked list into an Array
struct SomeDoublyLinkedList
{
@serializationIgnore DList!(SomeArr[]) myDll;
alias myDll this;
//no template but a function this time!
void serialize(ref AsdfSerializer serializer)
{
auto state = serializer.arrayBegin();
foreach (ref elem; myDll)
{
serializer.elemBegin;
serializer.serializeValue(elem);
}
serializer.arrayEnd(state);
}
}
struct S
{
@serializedAs!DateTimeProxy DateTime time;
}
@serializedAs!ProxyE
enum E
{
none,
bar,
}
// const(char)[] doesn't reallocate ASDF data.
@serializedAs!(const(char)[])
struct ProxyE
{
E e;
this(E e)
{
this.e = e;
}
this(in char[] str)
{
switch(str)
{
case "NONE":
case "NA":
case "N/A":
e = E.none;
break;
case "BAR":
case "BR":
e = E.bar;
break;
default:
throw new Exception("Unknown: " ~ cast(string)str);
}
}
string toString()
{
if (e == E.none)
return "NONE";
else
return "BAR";
}
E opCast(T : E)()
{
return e;
}
}
unittest
{
assert(serializeToJson(E.bar) == `"BAR"`);
assert(`"N/A"`.deserialize!E == E.none);
assert(`"NA"`.deserialize!E == E.none);
}
If you need to do additional calculations or etl transformations that happen to depend on the deserialized data use the finalizeDeserialization
method.
struct S
{
string a;
int b;
@serializationIgnoreIn double sum;
void finalizeDeserialization(Asdf data)
{
auto r = data["c", "d"];
auto a = r["e"].get(0.0);
auto b = r["g"].get(0.0);
sum = a + b;
}
}
assert(`{"a":"bar","b":3,"c":{"d":{"e":6,"g":7}}}`.deserialize!S == S("bar", 3, 13));
static struct S
{
@serializationFlexible uint a;
}
assert(`{"a":"100"}`.deserialize!S.a == 100);
assert(`{"a":true}`.deserialize!S.a == 1);
assert(`{"a":null}`.deserialize!S.a == 0);