title	order
MemoryWell	100

MemoryWell

Speed up synchronization and motion of data between computing threads: a well-built lock-free circular buffer.

See these docs:

How So? Pray tell

By using a circular buffer for both "allocation" of memory and synchronizing access to it between threads.

The overview explains the rationale behind this.

Code Example

Here is a simplified look at the steps involved:

/*
	Thread writing data into the circular buffer
*/
void *writer_thread(void *args)
{
	struct well *buffer = args;

	/* Reserve up any number of available blocks, up to 42 */
	size_t pos;
	size_t block_count = well_reserve(&buffer->tx, &pos, 42);

	/* Write into blocks.
	NOTE we access blocks one at a time: our set of reserved blocks could
		start at the end of the buffer and loop through the beginning.
	*/
	for (size_t j=0; j < block_count; j++) {
		void *block = well_access(pos, j, buffer);
		memset(block, 0x1, well_blk_size(buffer));
	}

	/* Release reservation into the other side of the buffer.
	We reserved from 'tx' and so release into 'rx'.
	*/
	well_release_single(&buffer->rx, res);
}

The "reader" thread would be identical, except it accesses the opposite "side" of buffer: rx instead of tx:

/*
	Thread reading data from circular buffer
*/
void *reader_thread(void *args)
{
	struct well *buffer = args;

	/* Reserve up any number of available blocks, up to 42 */
	size_t pos;
	size_t block_count = well_reserve(&buffer->rx, &pos, 42);

	/* Read from blocks */
	for (size_t j=0; j < block_count; j++)
		consume_data(well_access(pos, j, buffer));

	/* Release data back to 'tx' side (aka: unused, ready to be written).
	NOTE the buffer is **symmetrical**: we could also have written new data
		back into it, which the 'tx' side would then read.
	*/
	well_release_single(&buffer->tx, res);
}

Any of the programs in the test directory serves a documentation purpose in addition to validating correctness.

TODO: link to man pages.

Building and Installing

TODO: preamble to nix vs. non-nix; clean up both

Nix section

nix-build

Not all the dependencies (e.g. nonlibc at the time of this writing) may be in nix-pkgs - add a -I flag to nix-build pointing to (one or more) dirs containing the source repos for these packages:

memorywell$ nix-build
error: file ‘nonlibc’ was not found in the Nix search path
memorywell$ ls ~
nonlibc
memorywell$ nix-build -I ~

NOTE however that if nonlibc were in nix-pkgs then that version would override the one on your local disk.

Non-NIX section

You'll need Python ≥ 3.5.

Build and test on your machine by running:

./bootstrap.py

This project uses the Meson Build System.

For in-depth instructions on how to incorporate a Meson project in your build ecosystem, see the nonlibc library.

This project depends on nonlibc; Meson should handle that for you automatically.

NOTE that nonlibc is included as a Git Subtree; if you change anything please commit the nonlibc changes separately so they can be merged into the nonlibc repo upstream.

Benchmarks

After running boostrap.py, run benchmarks with:

cd build-release
ninja benchmark

Benchmarking is done for a varying counts of TX -> RX threads, and all combinations of the below:

Sync techniques

To test validity of the underlying algorithm and give comparative metrics, there is a compile-time choice between the following synch techniques:

WELL_DO_XCH : entirely implemented using C11 atomics
WELL_DO_MTX : pthread mutex
WELL_DO_SPL : naive spinlock using test_set and clear operations

Fail methods

The test routine being used for benchmarking, well_test.c, allows a compile-time choice of actions when a reserve() or release() call is unsuccessful (no blocks available):

SPIN : loop until the call is successful
COUNT : atomically increment a global waits counter
YIELD : call sched_yield()
SLEEP : call nanosleep() (currently inactive: takes forever)
BOUNDED : spinlock a few iterations and sched_yield() if still failing

Support

Communication is always welcome, feel free to send a pull request or drop me a line at sirio.bm@gmail.com.

Acknowledgements

A big Thank You to:

Tony for code, benchmarks, discussions, and pairing on the _release_multi() bug.
Brook Z of Massive Alliance for very good advice.
Robert Anderson of RPA Consultants for code and time spent going over the memory model.

What's with the name tho

All the cliché titles like lock-free, atomic, supercritical and cherenkov were predictably taken, and adding to the roughly 400 search results for circular buffer would have been a bit of a buzz-kill.

Hence MemoryWell, your friendly neighborhood circular buffer:

a well from whence to draw memory
a well-implemented circular buffer

siriobalmelli/memorywell