/pipe

A simple thread-safe FIFO in C.

Primary LanguageC

pipe.c
========

C has historically had minimal support for multithreading, and even less
for concurrency. pthreads was the first step to bringing threading
constructs to C, and it has served us well. However, threads and mutexes
alone aren't the only concurrency paradigm available to us. Very few
paradigms map nicely onto heavyweight threads and locking. For example,
the Actor model has no explicit locking at all.

Let's imagine a classic problem. We want to write a simple download
manager. First, we need to get the data from a high-latency internet
connection, then dump that data to a high-latency hard disk. In
classic C, waiting for one resource to finish transferring data is time
you can't spend on the other one! This is unacceptable since you're
dealing with two very slow connections. When the disk starts spinning
up, you don't want your download to slow down!

To solve this problem, we introduce a new data structure called a pipe.
We create two threads: one for receiving the data from the network,
and one for dumping the data to disk.

Whenever data is received, it will be pushed into the pipe instead of
directly to the disk. In the disk thread, it will read from the pipe,
dumping any data onto the disk *without blocking the networking thread*.
Since the two processes are decoupled by a fast in-memory pipe (which
is implemented with simple memcpy's), when one thread lags behind, the
other one is free to continue its operation. If the producer is lagging,
the consumer will block until there is data. If the consumer is lagging,
requests will just pile up in the queue to be handled "eventually". If
memory usage becomes a problem, we can limit the size of the pipe to
block the producer when the pipe gets too full.

Although this sample seems a bit trivial, the pipe is a lot more powerful
than it lets on. No matter how many producers or consumers you have, pipe
will behave predictably (although not necessarily fairly). Therefore, you
can use pipe for serialization, pipelining tasks, parallel processing,
futures, multiplexing, and so much more. It does this extremely quickly,
to the point where anything except trivial code will be the bottleneck,
not the pipe. It can do all this and more in under 1000 lines of C!

My true hope is that this library will spur development of concurrent
systems in all languages, and the creation of new ideas in the field.
Since C is so pervasive, I hope to see these concepts leaking into other
languages and libraries, hopefully being expanded upon and becoming a
building block of the next generation of concurrent programming.

Integration
------------

1. Copy pipe.c and pipe.h into your project's directory.
2. Get your compiler to build it (pipe.c needs -std=c99)
3. Use it!

Concurrency Specifications
---------------------------

For more details on this list, see:
http://www.1024cores.net/home/lock-free-algorithms/queues

Type of queue: Multi-producer/Multi-consumer
Underlying data structure: Array-based
Maximum size: Bounded/Unbounded can be selected at runtime
Overflow behavior: Block until there is room
Requirement for garbage collection: None
Support for priorities: No
Ordering guarantees: per-producer FIFO
Behavior on empty queue: Block until elements arrive

Compatibility
--------------

pipe.c has been tested on:

 * GNU/Linux i686
 * GNU/Linux x86-64
 * Windows 7 x64 (Cross-compiled from GNU/Linux with mingw64)
 * Windows 7 (Compiled with mingw)
 * Windows 7 i686 (Cygwin)
 * OSX 10.6

However, there's no reason it shouldn't work on any of the following
platforms. If you have successfully tested it on any of these, feel free
to let me know by dropping me an email <cg.wowus.cg@gmail.com>.

 * Any platform which supports pthreads (GNU/Linux, BSD, Mac)
 * Microsoft Windows NT -> Microsoft Windows 7

If you _must_ use Windows, try to only support windows Vista and
onwards. By using a recent operating system, you use the more robust
built-in condition variables instead of my hacky hand-rolled ones.

Supported Compilers:

Any compiler with C99 support, however, pipe.c has been tested with...

gcc 4.2.1
gcc 4.3.4 (Cygwin)
gcc 4.5.2
llvm-gcc 4.2.1
clang 2.8
clang 1.5 (Darwin)
icc 12.0
mingw 4.5.2
mingw 4.6.0

Unsupported Compilers:

msvc (any version) - No C99 support. The header will work, but the .c
                     itself will never compile on current versions.

The headers are supported by any compiler with ANSI C support and don't
do anything fancy. Therefore, if you want to use pipe.c with a compiler
such as msvc, you can compile the .c file with mingw and the rest of
your program with msvc, and link them all together without a hitch.

License
--------

The MIT License
Copyright (c) 2011 Clark Gaebel <cg.wowus.cg@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Speed Tweaking
---------------

There are a few things you can do to get the most speed out of the pipe.

 1 Use a vectorizing compiler, and an SSE-enabled memcpy. It's a lot
   faster, trust me. I recommend icc.
 2 Tweak `MUTEX_SPINS` in pipe.c. Set it to a low value if you tend to
   push large quantities of data at once, and a high value if you tend to
   push small quantities of data. Feel free to set it to zero to disable
   spin locking altogether and just use the native lock implementation.
 3 Turn on your compiler's link-time optimization.
 4 Do profile-guided optimization.
 5 Large buffers are preferable to small ones, unless your use case
   dictates otherwise. Emptying the pipe frequently tends to be faster
   and use less memory.
 6 Avoid limiting the size of the pipe unless you get serious memory
   issues.
 7 If you are always pushing predictable amounts of data into the pipe
   (such as 1000 elements at a time), you should consider using
   `pipe_reserve` to keep the pipe from needlessly growing and
   shrinking.
 8 If you can combine a whole bunch of elements into a single struct, it
   will be faster to push and pop one large struct at a time rather than
   a push/pop of many small structs.
 9 Read through the source code, and find bottlenecks! Don't forget to
   submit changes upstream, so the rest of the world can be in awe of
   your speed-hackery. I have also left a couple optimization hints
   in-source as a reward for actually reading it.

API Overview
-------------

The API is fully documented in pipe.h

Contributing
-------------

This is github! Just fork away and send a pull request. I'll take a look
as soon as I can. You are also always free to drop me an email at
cg.wowus.cg@gmail.com. I won't ignore you. Promise!

Things I'd love to see are speed improvements (especially those which
reduce lock contention), support for more compilers/platforms, a better
makefile (the current one is pitiful), and any algorithmic improvements.

Essentially, any patches are welcome which fulfill the goals of pipe:
speed, cross-platform-ness, a simple API, and beautiful code.

If you have successfully built and tested the pipe on a
compiler/architecture not previously mentioned, it would be nice if you
could drop me an email with your success/failure story so I can update
this README accordingly.


Branch information
-------------------

Currently, there are two branches of the code.

master contains the fastest code. However, it is not necessarily the most
readable. It uses two mutexes: one for pushing, and one for popping. This
allows for excellent single-writer single-reader performance. It is "done"
in terms of the only new code going into it is bugfixes and performance
tweaks. If you want to use pipe in your code, you want to checkout this
branch.

single_mutex is the simplest version, and the easiest to read. It is "done"
in terms of the only new code going into it is bugfixes. It implements
the queue with a single god-mutex which is simple in practice, but is not
the most concurrent design decision. If you want to read over an initial
proof-of-concept (and very elegant C), you want to checkout this branch.