krakjoe/stat

Binary Communication

Opened this issue · 5 comments

I went with json at first to make it easy to interface with stat from any language, so you could easily implement your ui stuff in node or java or c# or whatever.

JSON has rather a lot of overhead, not only in terms of memory (additional characters), but also in terms of instructions, having to build the json adds considerable complexity to the routine that dumps a sample to the stream, not to mention overhead on the decoding side, which clearly limits the ability of any interfacing software to process samples.

Everything is fine when the interval is at some quite large (or normal for other profilers) rate. When we get into the interesting range though, it's a struggle to retrieve data as quickly as stat can generate it.

Maybe, it would be a good idea to either switch to a binary form of communication, or to have it as an option (control) ...

I think I'm leaning in favour of dropping JSON altogether ... I will provide two implementations for decoding the stream, one internal implementation as part of this extension, and one composer package independent of the extension (which will obviously be less efficient). I will also document the binary form such that it should be possible to interface from any language still, just with a little more leg work.

Any thoughts or dissent ?

Wouldn't it be better to use an existing binary format like MessagePack?

Better than using json, probably ... but the aim here is to reduce instructions to the absolute bare minimum and the only way to do that is with a custom format - the format of zend_stat_sample_t as it is in memory, or as close to it as possible ...

nebez commented

What about using proto3 of Protocol Buffers? There's already deserializers in a lot of major languages, and .proto files are simple to both read and write.

I agree with you - I would ditch JSON entirely and opt for performance. JSON has the nice benefit of being both easily human readable and machine parseable. Human readability of the communication doesn't need to be a goal in this project.

rask commented

I'm not sure how C structs work internally, and how they differ when stored on heap/stack. But would the most performant approach be to just dump the frame's bytes from memory into the socket output, optionally with control/header bytes to denote what values and types are available? Or is this unsafe/fragile/something else and not doable?

If the struct is dumped as raw memory data, languages that can leverage C ABI (e.g. Rust) could just replicate the struct on their side and read the bytes into memory I presume?

I'm mainly a userland PHP developer so I am not in a position to make expert opinions on this, but would really like to try out Stat and help out someway.

But would the most performant approach be to just dump the frame's bytes from memory into the socket output, optionally with control/header bytes to denote what values and types are available? Or is this unsafe/fragile/something else and not doable?

It's complicated a little by strings, but that's basically how it will work ...

I've just not had the time to actually do it yet, I'm working on the arena related stuff right now ...