/nsync

nsync is a C library that exports various synchronization primitives, such as mutexes

Primary LanguageCApache License 2.0Apache-2.0

nsync is a C library that exports various synchronization primitives:
	locks
	condition variables
	run-once initialization
	waitable counter (useful for barriers)
	waitable bit (useful for cancellation, or other conditions)

It is not an offical Google product.

nsync may be desirable in place of pthread primitives in some cases:
- nsync locks are reader-writer locks (but are as efficient as mutexes).
- nsync locks and condition variables occupy only two words each.
- nsync works on Unix-like systems and Windows.  It should be portable to
  other platforms straightforwardly.
- nsync provides conditional critical sections.  These fill the same role
  as condition variables, but are usually easier to use, and in most common
  cases are comparable in speed.  They can be easier to use in two ways:
  - it's not necessary to surround the "wait" operation in a while loop;
    instead the condition is passed to the call as a function and arbitrary
    pointer argument.
  - it's not necessary to wake or signal explicitly when the condition(s)
    become true; they are checked automatically.
  The primary downsides are:
  - they are not available in most other common synchronization APIs, and so
    they may be unfamiliar (even though they date back to the 1960s), and
  - if threads routinely wait on many distinct, false conditions
    associated with the same lock, they may be slower than condition variables.
    In this case, clients can use condition variables in the normal way;
    conditional critical sections and condition variables can be used with
    the same lock.
- nsync waits can be cancelled via an object passed to the wait calls, unlike
  the pthread model in which threads are cancelled.  This difference can be
  useful if the computation needs multiple threads, or if cancellation affects
  only sub-operations within a larger operation by the thread.
See the section "Extensions to locks and condition variables" below.

Portability
===========
The library is intended to be portable, and to be compilable on a system with
only a C90 compiler, assuming atomic operations are available from the
compiler, operating system, or assembler routines.  It is able to use C11 or
C++11 atomic operations if they are available.
It can be compiled with a C++ compiler, and in its own C++ name space, if
desired, though no attempt has been made to present a class-based interface.

Building
========
The builds/ directory may already contain a subdirectory that
matches your platform.   For example, if you're on an x86_64,
running Linux, using gcc, you might pick "x86_64.linux.gcc".

If there is an appropriate subdirectory, in that subdirectory type:
	make depend test
which will calculate dependencies, build the library and its tests, and then
run them.  (On Windows, using Visual Studio ("x86_64.win32.msvc")
use "nmake" instead of "make".)

If there is no suitable subdirectory, on most Unix-like systems you can create
one with
	tools/mkmakefile.sh

The main reason it might fail is if it cannot find a suitable implementation of
atomic operations on the platform.  Atomic operations may be provided by
- compiler-dependent interfaces (currently, gcc and clang)
  These are auto detected by mkmakefile.sh.
- language-specific standards (currently, C11 and C++11)
  Selected in mkmakefile.sh via "-atomic c11" or  "-atomic c++11".
- operating system-dependent libraries (e.g., NetBSD, MacOS, Windows)
  Selected in mkmakefile.sh via "-atomic os".
- architecture-dependent libraries (e.g., x86_64, x86_32, aarch64,
  arm, mips, alpha)
  Selected in mkmakefile.sh via "-atomic asm"; file should be
  named platforms/<architecture>/src/nsync_atm_<architecture>.[csS]
  to be found by mkmakefile.sh.
If none of these match your platform, you may need to provide
an assembly language implementation.

Other possible issues:
- Some platforms put clock_gettime() in the realtime library.
  Give "-lrt" to mkmakefile.sh.
- The version identifier of "clang" can vary by installation,
  and so it may not be identified if invoked as "cc".
  Give "-cc clang" to mkmakefile.sh, if clang is not detected automatically.
- Some CPU architectures have many variants, making it
  difficult to rely on a single identifier.
  Give "-arch <architecture>" to mkmakefile.sh to specify
  a particular string.

mkmakefile.sh recognises a couple of special cases:
- MacOS doesn't provide clock_gettime(); a compatibility routine
  is found in platform/posix/src/clock_gettime.c
  See builds/x86_64.macos.clang/Makefile
- OpenBSD and Irix do not provide thread-local storage, which is accommodated
  by adding -I../../platform/gcc_no_tls to the include path.
  See, for example,  builds/x86_64.openbsd.gcc/Makefile.

Further customization is possible by editing the Makefile, directly.
For Unix-like systems is typically only a few lines long.
For example, compare
	builds/x86_64.linux.g++/Makefile
with
	builds/x86_64.linux.gcc/Makefile
to see how to compile the entire library in C++, rather than C.

CMake
-----

CMake can also be used to build:

    $ mkdir out
    $ cd out/
    $ cmake ..
    $ make
    $ make install

The C library will be called libnsync and C++ is libnsync_cpp.

Tests can be disabled with the CMake option: -DNSYNC_ENABLE_TESTS=0.
To build shared libraries instead of static use: -DBUILD_SHARED_LIBS=ON.

CMake version >= 3.0 is strongly recommended.

Code structure
==============
public/		Public header files for library.
builds/*/	Platform-dependent build directories, each with Makefile.
internal/	Platform-independent library source code, and Makefile fragment.
platform/*/	Platform-dependent source code.
testing/	Platform-independent testing source code is in "testing".
tools/		Optional tools that can be used to create Makefile
		dependencies and run tests.

Where possible, the code avoids conditional compilation (#if, etc.),
to avoid becoming a mess of C-preprocessor directives.
The platform-dependent Makefiles set the appropriate include
paths and specify platform-dependent modules where needed.

The build directories of the various platforms are kept separate to allow
multiple platforms to be accommodated in one shared file system.

Differences from pthread locks and condition variables
======================================================

Conditional critical sections
-----------------------------
Consider the following use of a condition variable:
	/* variable declarations */
	nsync_mu mu = NSYNC_MU_INIT;  /* protects i */
	int i = 1;
	nsync_cv cv = NSYNC_CV_INIT;  /* signalled when i reaches 0 */

	...

	/* Waiter */
	nsync_mu_lock (&mu);
	while (i != 0) {
		nsync_cv_wait (&cv, &mu);
	}
	/* i is zero ... */
	nsync_mu_unlock (&mu);

	...

	/* Decrementer */
	nsync_mu_lock (&mu)
	i--;
	if (i == 0) {
		nsync_cv_broadcast (&cv);
	}
	nsync_mu_unlock (&mu);

With conditional critical sections, the equivalent is:
	/* variable declarations */
	nsync_mu mu = NSYNC_MU_INIT;  /* protects i */
	int i = 1;

	/* Condition */
	int int_is_zero (void *v) {
		return (*(int *)v == 0);
	}

	...

	/* Waiter */
	nsync_mu_lock (&mu);
	nsync_mu_wait (&mu, &int_is_zero, &i)
	/* i is zero ... */
	nsync_mu_unlock (&mu);

	...

	/* Decrementer */
	nsync_mu_lock (&mu)
	i--;
	nsync_mu_unlock (&mu);

For the cost of writing a function that evaluates the desired
condition, the waiter's while-loop, and the decrementer's
signalling are handled by the implementation.

In most cases, this makes code easier to write and debug.

The primary cost is that the implementation must check whether any waiters'
conditions have become true when releasing the lock.  This cost becomes most
noticable when threads wait on many distinct, false conditions.  In such cases,
some or all of the conditions can be converted to use condition variables and
explicit signalling.

C++ users may be tempted to wrap this functionality in a way that uses
lambda expressions for the conditions.  This will work, but may be less
efficient, because C++ does not provide a means to detect whether two lambda
expressions evaluate the same function.  This may force the implementation to
evaluate the same false condition many more times than it otherwise might.

Reader/writer locks
-------------------
There is no particular reason why a reader/writer lock need be
significantly slower than a simple mutex.  In both cases, the lock can be
acquired or released with a single atomic read-modify-write sequence.
Thus, the type nsync_mu is a reader/writer lock.  Locks with reader-sections
can be used with condition variables and conditional critical sections
without affecting correctness.

Cancellation
------------
The pthread API allows the cancellation of individual threads, 
and once a thread has been cancelled, it is expected to terminate soon.  This
can work well in some cases, but may not be convenient if an activity is
associated with many threads, or if threads routinely act on behalf of multiple
activities.

In nsync, cancellation involves an object separate from the thread, called an
nsync_note.  An nsync_note is conceptually a boolean that makes a single
transition from false to true: it starts off "unnotified", can be notified:
- by an explicit nsync_note_notify() call,
- due to a timeout, or
- due to the transition of an optional parent nsync_note.
So, for example, in a network server, a request with a deadline
might have an nsync_note associated with it.  Activities associated with
that request might each have a child nsync_note, possibly with shorter
deadlines.   A cancellation request from the original caller
might cancel the parent, which would cancel all the children.

The calls nsync_cv_wait_with_deadline() and nsync_mu_wait_with_deadline() take
both a deadline and a pointer to an nsync_note, and will wake when the awaited
condition becomes true, when the deadline (if any) expires, or when the
nsync_note becomes notified.  The return value indicates which of these
occurred.