Encode / Decode local buffer before sending to VM
tqwewe opened this issue · 9 comments
The current JSON serializer implementation uses serde_json::to_writer
and serde_json::from_reader
functions for encoding and decoding.
This means each "token" in a JSON message will be sent as an individual message to the VM with the lunatic::host::api::message::write_data
function.
For example, the JSON {"name":"John Doe","age":22}
would be sent as 15 separate calls to write_data
:
{
"
name
"
:
"
John Doe
"
,
"
age
"
:
22
}
It would probably be much more efficient to use serde_json::to_vec
, and then MessageRw {}.write(...)
.
I doubt it, but without benchmarks it's hard to tell.
serde_json::to_vec
will allocate a buffer of 128 bytes and then still do all the 15 separate calls to write into the buffer creating unnecessary? copies of the data. If the message is bigger than 128 bytes it might regrow the vector, creating additional allocations and copies. And then it will again write everything to the host.
Just doing 15 host calls should have less overhead than this. And I assume that host call memory access is faster because it can eliminate memory access bound checks in some cases.
I see. On the serde_json docs it says:
When reading from a source against which short reads are not efficient, such as a File, you will want to apply your own buffering because serde_json will not buffer the input. See std::io::BufReader.
I had assumed host calls were a little expensive, but if this is not the case then it's probably fine as is.
Yeah, async host calls have some overhead because how Wasmtime handles them and they include a memory allocation. The rest should be close to a wasm function call.
And reading from i/o is a bit specific. Some devices are slow to execute the read and it can be much better to grab more data at once. Buffered readers will perform better in this case. I don't think this is a big issue for us though, we are reading straight from a memory region in the host.
I did a benchmark with both.
There are 3 benchmarks:
- Encrypt then decrypt small amount of data (just an i32)
- Encrypt then decrypt medium amount of data (login payload with email, password, remember_me)
- Encrypt then decrypt large amount of data (user information with 7 fields)
Benchmark code
#[derive(Serialize, Deserialize)]
struct Age(i32);
#[derive(Serialize, Deserialize)]
struct Login {
username: String,
password: String,
remember: bool,
}
#[derive(Serialize, Deserialize)]
struct User {
username: String,
password: String,
remember: bool,
address: String,
login_count: i32,
slug: String,
profile: (String, i32),
}
fn serialize_benchmark(c: &mut Criterion) {
c.bench_function("encode_small", |b| {
b.iter(|| {
// Spawn task and wait for it to finish.
let _ = spawn_link!(@task || {
Json::encode(&Age(10)).unwrap();
let _: Age = Json::decode().unwrap();
})
.result();
})
});
c.bench_function("encode_medium", |b| {
b.iter(|| {
// Spawn task and wait for it to finish.
let _ = spawn_link!(@task || {
Json::encode(&Login {
username: "johndoe@gmail.com".to_string(),
password: "JohnTheGod".to_string(),
remember: true,
}).unwrap();
let _: Login = Json::decode().unwrap();
})
.result();
})
});
c.bench_function("encode_large", |b| {
b.iter(|| {
// Spawn task and wait for it to finish.
let _ = spawn_link!(@task || {
Json::encode(&User {
username: "johndoe@gmail.com".to_string(),
password: "JohnTheGod".to_string(),
remember: true,
address: "123 Baker Street, NZ".to_string(),
login_count: 132342,
slug: "/john-doe123".to_string(),
profile: ("Johnny".to_string(), 1221),
}).unwrap();
let _: User = Json::decode().unwrap();
})
.result();
})
});
}
Current approach, with serde_json::to_reader
encode_small time: [147.99 µs 148.88 µs 149.98 µs]
encode_medium time: [164.36 µs 165.20 µs 166.17 µs]
encode_large time: [180.40 µs 181.00 µs 181.62 µs]
Proposed approach, with serde_json::to_vec
encode_small time: [156.84 µs 157.98 µs 159.16 µs]
change: [+3.9552% +5.1817% +6.4773%] (p = 0.00 < 0.05)
Performance has regressed.
encode_medium time: [165.73 µs 167.04 µs 168.48 µs]
change: [-0.6475% +0.4607% +1.5071%] (p = 0.41 > 0.05)
No change in performance detected.
encode_large time: [165.69 µs 166.77 µs 168.00 µs]
change: [-7.9640% -5.9076% -2.3000%] (p = 0.00 < 0.05)
Performance has improved.
It seems like it could be beneficial for large amounts of data, but for simple messages it's faster with the current approach.
Thanks for benchmarking this!
Could you add a bit more extreme benchmark, like 100 fields? Just to see how much it impacts performance in this case.
And do it without spawning a task, what I assume is the majority of the waiting time. If the compiler optimises it out, maybe use a black box.
I think something like this is a better benchmark:
c.bench_function("encode_large", |b| {
b.iter(|| {
black_box(Json::encode(&User {
username: "johndoe@gmail.com".to_string(),
password: "JohnTheGod".to_string(),
remember: true,
address: "123 Baker Street, NZ".to_string(),
login_count: 132342,
slug: "/john-doe123".to_string(),
profile: ("Johnny".to_string(), 1221),
}).unwrap(;);
});
});
You probably also need to call send
to a non-existing process just to clear the buffer after serializing for the next message. Something like:
c.bench_function("encode_large", |b| {
b.iter(|| {
unsafe {
lunatic::host::api::message::create_data(0, 0);
}
black_box(
Json::encode(&User {
username: "johndoe@gmail.com".to_string(),
password: "JohnTheGod".to_string(),
remember: true,
address: "123 Baker Street, NZ".to_string(),
login_count: 132342,
slug: "/john-doe123".to_string(),
profile: ("Johnny".to_string(), 1221),
})
.unwrap(),
);
unsafe {
lunatic::host::api::message::send(1337);
}
});
});
EDIT: And create a new buffer before sending it.
Benchmark code
#![allow(clippy::let_unit_value)]
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use lunatic::serializer::{Json, Serializer};
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
struct Age(i32);
#[derive(Serialize, Deserialize)]
struct Login {
username: String,
password: String,
remember: bool,
}
#[derive(Serialize, Deserialize)]
struct User {
a: String,
b: i32,
c: bool,
d: String,
e: i32,
f: bool,
h: String,
i: i32,
j: bool,
k: String,
l: i32,
m: bool,
n: String,
o: i32,
p: bool,
q: String,
r: i32,
s: bool,
t: String,
u: i32,
v: bool,
w: String,
x: i32,
y: bool,
z: String,
aa: i32,
bb: bool,
cc: String,
dd: i32,
ee: bool,
ff: String,
gg: i32,
hh: bool,
ii: String,
jj: i32,
kk: bool,
ll: String,
mm: i32,
nn: bool,
oo: String,
pp: i32,
qq: bool,
rr: String,
ss: i32,
tt: bool,
uu: String,
vv: i32,
ww: bool,
xx: String,
yy: i32,
zz: bool,
aaa: String,
aab: i32,
aac: bool,
aad: String,
aae: i32,
aaf: bool,
aah: String,
aai: i32,
aaj: bool,
aak: String,
aal: i32,
aam: bool,
aan: String,
aao: i32,
aap: bool,
aaq: String,
aar: i32,
aas: bool,
aat: String,
aau: i32,
aav: bool,
aaw: String,
aax: i32,
aay: bool,
aaz: String,
aaaa: i32,
aabb: bool,
aacc: String,
aadd: i32,
aaee: bool,
aaff: String,
aagg: i32,
aahh: bool,
aaii: String,
aajj: i32,
aakk: bool,
aall: String,
aamm: i32,
aann: bool,
aaoo: String,
aapp: i32,
aaqq: bool,
aarr: String,
aass: i32,
aatt: bool,
aauu: String,
aavv: i32,
aaww: bool,
aaxx: String,
aayy: i32,
aazz: bool,
}
fn serialize_benchmark(c: &mut Criterion) {
c.bench_function("encode_small", |b| {
b.iter(|| {
unsafe {
lunatic::host::api::message::create_data(0, 0);
}
black_box(Json::encode(&Age(10)).unwrap());
unsafe {
lunatic::host::api::message::send(1337);
}
});
});
c.bench_function("encode_medium", |b| {
b.iter(|| {
unsafe {
lunatic::host::api::message::create_data(0, 0);
}
black_box(
Json::encode(&Login {
username: "johndoe@gmail.com".to_string(),
password: "JohnTheGod".to_string(),
remember: true,
})
.unwrap(),
);
unsafe {
lunatic::host::api::message::send(1337);
}
});
});
c.bench_function("encode_large", |b| {
b.iter(|| {
unsafe {
lunatic::host::api::message::create_data(0, 0);
}
black_box(
Json::encode(&User {
a: "a".to_string(),
b: 123,
c: true,
d: "d".to_string(),
e: 123,
f: true,
h: "h".to_string(),
i: 123,
j: true,
k: "k".to_string(),
l: 123,
m: true,
n: "n".to_string(),
o: 123,
p: true,
q: "q".to_string(),
r: 123,
s: true,
t: "t".to_string(),
u: 123,
v: true,
w: "w".to_string(),
x: 123,
y: true,
z: "z".to_string(),
aa: 123,
bb: true,
cc: "cc".to_string(),
dd: 123,
ee: true,
ff: "ff".to_string(),
gg: 123,
hh: true,
ii: "ii".to_string(),
jj: 123,
kk: true,
ll: "ll".to_string(),
mm: 123,
nn: true,
oo: "oo".to_string(),
pp: 123,
qq: true,
rr: "rr".to_string(),
ss: 123,
tt: true,
uu: "uu".to_string(),
vv: 123,
ww: true,
xx: "xx".to_string(),
yy: 123,
zz: true,
aaa: "aaa".to_string(),
aab: 123,
aac: true,
aad: "aad".to_string(),
aae: 123,
aaf: true,
aah: "aah".to_string(),
aai: 123,
aaj: true,
aak: "aak".to_string(),
aal: 123,
aam: true,
aan: "aan".to_string(),
aao: 123,
aap: true,
aaq: "aaq".to_string(),
aar: 123,
aas: true,
aat: "aat".to_string(),
aau: 123,
aav: true,
aaw: "aaw".to_string(),
aax: 123,
aay: true,
aaz: "aaz".to_string(),
aaaa: 123,
aabb: true,
aacc: "aacc".to_string(),
aadd: 123,
aaee: true,
aaff: "aaff".to_string(),
aagg: 123,
aahh: true,
aaii: "aaii".to_string(),
aajj: 123,
aakk: true,
aall: "aall".to_string(),
aamm: 123,
aann: true,
aaoo: "aaoo".to_string(),
aapp: 123,
aaqq: true,
aarr: "aarr".to_string(),
aass: 123,
aatt: true,
aauu: "aauu".to_string(),
aavv: 123,
aaww: true,
aaxx: "aaxx".to_string(),
aayy: 123,
aazz: true,
})
.unwrap(),
);
unsafe {
lunatic::host::api::message::send(1337);
}
});
});
}
criterion_group!(benches, serialize_benchmark);
criterion_main!(benches);
Using reader
fn encode(message: &M) -> Result<(), EncodeError> {
serde_json::to_writer(MessageRw {}, message).map_err(|err| err.into())
}
encode_small time: [192.36 ns 193.59 ns 194.81 ns]
encode_medium time: [2.9816 µs 2.9975 µs 3.0167 µs]
encode_large time: [76.951 µs 77.308 µs 77.749 µs]
To vec, then write_data
fn encode(message: &M) -> Result<(), EncodeError> {
let data = serde_json::to_vec(message)?;
MessageRw {}.write_all(&data).unwrap();
Ok(())
}
encode_small time: [243.12 ns 244.26 ns 245.59 ns]
change: [+26.851% +27.802% +28.877%] (p = 0.00 < 0.05)
Performance has regressed.
encode_medium time: [637.66 ns 641.33 ns 644.90 ns]
change: [-78.638% -78.274% -77.796%] (p = 0.00 < 0.05)
Performance has improved.
encode_large time: [9.4231 µs 9.5321 µs 9.6328 µs]
change: [-87.954% -87.697% -87.432%] (p = 0.00 < 0.05)
Performance has improved.
The performance benefit of doing this is huge for messages with multiple fields. I assume this is going to be the majority of cases. Now I think we should definitely change to serde_json::to_vec
here.
Turns out Bincode benefits from this too.
Bincode writer
fn encode(message: &M) -> Result<(), EncodeError> {
bincode::serialize_into(MessageRw {}, message).map_err(|err| err.into())
}
encode_small time: [162.38 ns 162.80 ns 163.22 ns]
change: [-27.831% -27.273% -26.686%] (p = 0.00 < 0.05)
Performance has improved.
encode_medium time: [803.79 ns 805.28 ns 806.88 ns]
change: [+106.27% +107.58% +109.02%] (p = 0.00 < 0.05)
Performance has regressed.
encode_large time: [15.967 µs 16.005 µs 16.047 µs]
change: [+351.43% +354.25% +356.99%] (p = 0.00 < 0.05)
Performance has regressed.
Bincode to vec
fn encode(message: &M) -> Result<(), EncodeError> {
let data = bincode::serialize(message)?;
MessageRw {}.write_all(&data).unwrap();
Ok(())
}
encode_small time: [219.88 ns 220.85 ns 221.98 ns]
change: [+34.337% +35.382% +36.428%] (p = 0.00 < 0.05)
Performance has regressed.
encode_medium time: [387.20 ns 388.66 ns 390.39 ns]
change: [-52.132% -51.783% -51.398%] (p = 0.00 < 0.05)
Performance has improved.
encode_large time: [3.4997 µs 3.5069 µs 3.5149 µs]
change: [-78.360% -78.223% -78.097%] (p = 0.00 < 0.05)
Performance has improved.