nsync is a C library that exports various synchronization primitives: locks condition variables run-once initialization waitable counter (useful for barriers) waitable bit (useful for cancellation, or other conditions) It is not an offical Google product. nsync may be desirable in place of pthread primitives in some cases: - nsync locks are reader-writer locks (but are as efficient as mutexes). - nsync locks and condition variables occupy only two words each. - nsync works on Unix-like systems and Windows. It should be portable to other platforms straightforwardly. - nsync provides conditional critical sections. These fill the same role as condition variables, but are usually easier to use, and in most common cases are comparable in speed. They can be easier to use in two ways: - it's not necessary to surround the "wait" operation in a while loop; instead the condition is passed to the call as a function and arbitrary pointer argument. - it's not necessary to wake or signal explicitly when the condition(s) become true; they are checked automatically. The primary downsides are: - they are not available in most other common synchronization APIs, and so they may be unfamiliar (even though they date back to the 1960s), and - if threads routinely wait on many distinct, false conditions associated with the same lock, they may be slower than condition variables. In this case, clients can use condition variables in the normal way; conditional critical sections and condition variables can be used with the same lock. - nsync waits can be cancelled via an object passed to the wait calls, unlike the pthread model in which threads are cancelled. This difference can be useful if the computation needs multiple threads, or if cancellation affects only sub-operations within a larger operation by the thread. See the section "Extensions to locks and condition variables" below. Portability =========== The library is intended to be portable, and to be compilable on a system with only a C90 compiler, assuming atomic operations are available from the compiler, operating system, or assembler routines. It is able to use C11 or C++11 atomic operations if they are available. It can be compiled with a C++ compiler, and in its own C++ name space, if desired, though no attempt has been made to present a class-based interface. Building ======== The builds/ directory may already contain a subdirectory that matches your platform. For example, if you're on an x86_64, running Linux, using gcc, you might pick "x86_64.linux.gcc". If there is an appropriate subdirectory, in that subdirectory type: make depend test which will calculate dependencies, build the library and its tests, and then run them. (On Windows, using Visual Studio ("x86_64.win32.msvc") use "nmake" instead of "make".) If there is no suitable subdirectory, on most Unix-like systems you can create one with tools/mkmakefile.sh The main reason it might fail is if it cannot find a suitable implementation of atomic operations on the platform. Atomic operations may be provided by - compiler-dependent interfaces (currently, gcc and clang) These are auto detected by mkmakefile.sh. - language-specific standards (currently, C11 and C++11) Selected in mkmakefile.sh via "-atomic c11" or "-atomic c++11". - operating system-dependent libraries (e.g., NetBSD, MacOS, Windows) Selected in mkmakefile.sh via "-atomic os". - architecture-dependent libraries (e.g., x86_64, x86_32, aarch64, arm, mips, alpha) Selected in mkmakefile.sh via "-atomic asm"; file should be named platforms/<architecture>/src/nsync_atm_<architecture>.[csS] to be found by mkmakefile.sh. If none of these match your platform, you may need to provide an assembly language implementation. Other possible issues: - Some platforms put clock_gettime() in the realtime library. Give "-lrt" to mkmakefile.sh. - The version identifier of "clang" can vary by installation, and so it may not be identified if invoked as "cc". Give "-cc clang" to mkmakefile.sh, if clang is not detected automatically. - Some CPU architectures have many variants, making it difficult to rely on a single identifier. Give "-arch <architecture>" to mkmakefile.sh to specify a particular string. mkmakefile.sh recognises a couple of special cases: - MacOS doesn't provide clock_gettime(); a compatibility routine is found in platform/posix/src/clock_gettime.c See builds/x86_64.macos.clang/Makefile - OpenBSD and Irix do not provide thread-local storage, which is accommodated by adding -I../../platform/gcc_no_tls to the include path. See, for example, builds/x86_64.openbsd.gcc/Makefile. Further customization is possible by editing the Makefile, directly. For Unix-like systems is typically only a few lines long. For example, compare builds/x86_64.linux.g++/Makefile with builds/x86_64.linux.gcc/Makefile to see how to compile the entire library in C++, rather than C. CMake ----- CMake can also be used to build: $ mkdir out $ cd out/ $ cmake .. $ make $ make install The C library will be called libnsync and C++ is libnsync_cpp. Tests can be disabled with the CMake option: -DNSYNC_ENABLE_TESTS=0. To build shared libraries instead of static use: -DBUILD_SHARED_LIBS=ON. CMake version >= 3.0 is strongly recommended. Code structure ============== public/ Public header files for library. builds/*/ Platform-dependent build directories, each with Makefile. internal/ Platform-independent library source code, and Makefile fragment. platform/*/ Platform-dependent source code. testing/ Platform-independent testing source code is in "testing". tools/ Optional tools that can be used to create Makefile dependencies and run tests. Where possible, the code avoids conditional compilation (#if, etc.), to avoid becoming a mess of C-preprocessor directives. The platform-dependent Makefiles set the appropriate include paths and specify platform-dependent modules where needed. The build directories of the various platforms are kept separate to allow multiple platforms to be accommodated in one shared file system. Differences from pthread locks and condition variables ====================================================== Conditional critical sections ----------------------------- Consider the following use of a condition variable: /* variable declarations */ nsync_mu mu = NSYNC_MU_INIT; /* protects i */ int i = 1; nsync_cv cv = NSYNC_CV_INIT; /* signalled when i reaches 0 */ ... /* Waiter */ nsync_mu_lock (&mu); while (i != 0) { nsync_cv_wait (&cv, &mu); } /* i is zero ... */ nsync_mu_unlock (&mu); ... /* Decrementer */ nsync_mu_lock (&mu) i--; if (i == 0) { nsync_cv_broadcast (&cv); } nsync_mu_unlock (&mu); With conditional critical sections, the equivalent is: /* variable declarations */ nsync_mu mu = NSYNC_MU_INIT; /* protects i */ int i = 1; /* Condition */ int int_is_zero (void *v) { return (*(int *)v == 0); } ... /* Waiter */ nsync_mu_lock (&mu); nsync_mu_wait (&mu, &int_is_zero, &i) /* i is zero ... */ nsync_mu_unlock (&mu); ... /* Decrementer */ nsync_mu_lock (&mu) i--; nsync_mu_unlock (&mu); For the cost of writing a function that evaluates the desired condition, the waiter's while-loop, and the decrementer's signalling are handled by the implementation. In most cases, this makes code easier to write and debug. The primary cost is that the implementation must check whether any waiters' conditions have become true when releasing the lock. This cost becomes most noticable when threads wait on many distinct, false conditions. In such cases, some or all of the conditions can be converted to use condition variables and explicit signalling. C++ users may be tempted to wrap this functionality in a way that uses lambda expressions for the conditions. This will work, but may be less efficient, because C++ does not provide a means to detect whether two lambda expressions evaluate the same function. This may force the implementation to evaluate the same false condition many more times than it otherwise might. Reader/writer locks ------------------- There is no particular reason why a reader/writer lock need be significantly slower than a simple mutex. In both cases, the lock can be acquired or released with a single atomic read-modify-write sequence. Thus, the type nsync_mu is a reader/writer lock. Locks with reader-sections can be used with condition variables and conditional critical sections without affecting correctness. Cancellation ------------ The pthread API allows the cancellation of individual threads, and once a thread has been cancelled, it is expected to terminate soon. This can work well in some cases, but may not be convenient if an activity is associated with many threads, or if threads routinely act on behalf of multiple activities. In nsync, cancellation involves an object separate from the thread, called an nsync_note. An nsync_note is conceptually a boolean that makes a single transition from false to true: it starts off "unnotified", can be notified: - by an explicit nsync_note_notify() call, - due to a timeout, or - due to the transition of an optional parent nsync_note. So, for example, in a network server, a request with a deadline might have an nsync_note associated with it. Activities associated with that request might each have a child nsync_note, possibly with shorter deadlines. A cancellation request from the original caller might cancel the parent, which would cancel all the children. The calls nsync_cv_wait_with_deadline() and nsync_mu_wait_with_deadline() take both a deadline and a pointer to an nsync_note, and will wake when the awaited condition becomes true, when the deadline (if any) expires, or when the nsync_note becomes notified. The return value indicates which of these occurred.