Siphon is a library of highly optimized parsers for common protocol and data formats. The design goal is to give as much control as possible to the caller while maintaining a minimal interface. Additionally, it should provide a flexible interface for binding to from other languages. Most of this code was developed while binding it to LuaJITs FFI.
Most of the parsers use a pull model: that is, the caller requests the next value from an input buffer. This allows the caller to manage the continuation of the parser as well as the memory used for the tokens.
The library aims to make little to no memory allocations and instead prefers that the caller maintain the input buffer until a value may be extracted from it. The parser will not extract the value itself, but instead provide the bounds for the value within the buffer.
Similarly, for plain byte sequences (HTTP bodies, MsgPack strings, binary and
extension values), siphon will require that the caller handle the value. This
requires a little more work, but it enables much more powerful usages.
Particularly this enables mixing of zero-copy primatives with the parser. For
example, a process could use splice(2)
to transfer a msgpack string from a
back-end parser, to an HTTP response body, or splice an incoming HTTP request
directly to disk.
For a quick HTTP example:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <siphon/siphon.h>
static int fd = STDIN_FILENO;
static char buf[8192]; // buffer to read into
static size_t end = 0; // last byte position in the buffer
static size_t off = 0; // offset into the buffer
// reads more from fd into buffer
static void read_more (void);
// reads raw body bytes from the buffer and/or socket
static void read_raw (size_t len);
int
main (void)
{
SpHttp p;
sp_http_init_request (&p);
while (!sp_http_is_done (&p)) {
ssize_t rc = sp_http_next (&p, buf+off, end-off);
// TODO: handle parser errors gracefully
if (rc < 0) sp_exit (rc, EXIT_FAILURE);
// could not parse a token so read more
if (rc == 0) {
read_more ();
continue;
}
// TODO: do something with the token
sp_http_print (&p, buf+off, stdout);
// mark the used range of the buffer
off += rc;
// handle body values
if (p.type == SP_HTTP_BODY_START) {
if (!p.as.body_start.chunked) {
read_raw (p.as.body_start.content_length);
}
}
else if (p.type == SP_HTTP_BODY_CHUNK) {
read_raw (p.as.body_chunk.length);
}
}
return 0;
}
void
read_more (void)
{
if (off == end) {
// nothing to move so reset
end = off = 0;
}
else if (end > sizeof (buf) / 2) {
// reclaim buffer space if more than half is used
memmove (buf, buf+off, end-off);
end -= off;
off = 0;
}
// read more at the end of the buffer
ssize_t n = read (fd, buf+end, sizeof (buf) - end);
if (n < 0) sp_exit (errno, EXIT_FAILURE);
// push out end position
end += n;
}
void
read_raw (size_t len)
{
while (len > 0) {
// only process the amount in the buffer
size_t amt = len;
if (amt > end - off) amt = end - off;
// read more if nothing is availble
if (amt == 0) {
read_more ();
continue;
}
// write out the raw bytes
fwrite (buf+off, 1, amt, stdout);
fflush (stdout);
len -= amt; // update number of raw bytes remaining
off += amt; // update buffer offset position
}
}
Siphon currently uses cmake(1)
for building and installing.
For a typical release build and install cycle, follow these steps:
cmake -H. -Bbuild/release -DCMAKE_BUILD_TYPE=Release
cmake --build build/release
cd build/release
ctest --output-on-failure
cd ../..
cmake --build build/release --target install
The default build tool for unix systems is make(1)
. While the build is
reasonably quick, it is faster to use a build tool like
ninja. To use ninja, change the target
configuration command to:
cmake -H. -Bbuild/release -DCMAKE_BUILD_TYPE=Release -G Ninja
This build assumes the current working directory is the root of the siphon
project. To work from a different directory, specifiy the path to the siphon
project root with the -H
option.
For a different target directory, change the value of the -B
and --build
options used the various commands.
The default install prefix is /usr/local
, to change this, define a value for
CMAKE_INSTALL_PREFIX
in the target setup command. For example:
cmake -H. -Bbuild/release -DCMAKE_INSTALL_PREFIX=$HOME/opt -DCMAKE_BUILD_TYPE=Release
The debug build is preferrable for devlopment and more thourough testing purposes.
These steps assume valgrind(1)
is installed.
cmake -H. -Bbuild/debug -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTING=ON
cmake --build build/debug
cd build/debug
ctest --output-on-failure -D ExperimentalMemCheck
In addition to the test suite, the are a set of test tools that simply take
input on stdin
and and report back success or failure. The fuzz tests use
these tools to attempt producing some input that cuases the tool to either
crash or hang.
Additionally, the memory used for these tests is allocated using the allocator defined in lib/alloc.c. This allocator is very wasteful on memory, but protects against buffer over-reads and over-writes. If either occur, the process will crash and fuzz tester will log the input that cuased the error condition.
Fuzzing is a very slow process. A typical run for each fuzz test is at least 24 hours, but for critical changes each test is run for approximately 4 days.
The fuzz tool used here is afl-fuzz
. If
installed to a custom location, make sure that both afl-fuzz
and afl-gcc
are in your $PATH
.
Now to build the test programs:
cmake -H. -Bbuild/fuzz -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$(which afl-gcc)
# or cmake -H. -Bbuild/fuzz -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$(which afl-clang)
cmake --build build/fuzz
Set the test type as an environment variable:
export FUZZ=http
Currently, this can be either http
, json
, msgpack
, or uri
.
Create a directory to capture the results:
mkdir -p build/fuzz/$FUZZ
The afl-fuzz
tool can be run in parallel. In order to both manage multiple
shells and desire to leave the session and return, it is highly recommended
to use tmux
or screen
.
First, start up the main test process:
afl-fuzz -i test/fuzz/$FUZZ -o build/fuzz/$FUZZ -M fuzz01 ./build/fuzz/test-$FUZZ-input
The -M fuzz01
identifies this as the main process. For each secondary process,
run using the -S
option instead, and use a unique number.
afl-fuzz -i test/fuzz/$FUZZ -o build/fuzz/$FUZZ -S fuzz02 ./build/fuzz/test-$FUZZ-input
I'll typically run 4-8 of these processes during the test run.