pipe.c ======== C has historically had minimal support for multithreading, and even less for concurrency. pthreads was the first step to bringing threading constructs to C, and it has served us well. However, threads and mutexes alone aren't the only concurrency paradigm available to us. Very few paradigms map nicely onto heavyweight threads and locking. For example, the Actor model has no explicit locking at all. Let's imagine a classic problem. We want to write a simple download manager. First, we need to get the data from a high-latency internet connection, then dump that data to a high-latency hard disk. In classic C, waiting for one resource to finish transferring data is time you can't spend on the other one! This is unacceptable since you're dealing with two very slow connections. When the disk starts spinning up, you don't want your download to slow down! To solve this problem, we introduce a new data structure called a pipe. We create two threads: one for receiving the data from the network, and one for dumping the data to disk. Whenever data is received, it will be pushed into the pipe instead of directly to the disk. In the disk thread, it will read from the pipe, dumping any data onto the disk *without blocking the networking thread*. Since the two processes are decoupled by a fast in-memory pipe (which is implemented with simple memcpy's), when one thread lags behind, the other one is free to continue its operation. If the producer is lagging, the consumer will block until there is data. If the consumer is lagging, requests will just pile up in the queue to be handled "eventually". If memory usage becomes a problem, we can limit the size of the pipe to block the producer when the pipe gets too full. Although this sample seems a bit trivial, the pipe is a lot more powerful than it lets on. No matter how many producers or consumers you have, pipe will behave predictably (although not necessarily fairly). Therefore, you can use pipe for serialization, pipelining tasks, parallel processing, futures, multiplexing, and so much more. It does this extremely quickly, to the point where anything except trivial code will be the bottleneck, not the pipe. It can do all this and more in under 1000 lines of C! My true hope is that this library will spur development of concurrent systems in all languages, and the creation of new ideas in the field. Since C is so pervasive, I hope to see these concepts leaking into other languages and libraries, hopefully being expanded upon and becoming a building block of the next generation of concurrent programming. Integration ------------ 1. Copy pipe.c and pipe.h into your project's directory. 2. Get your compiler to build it (pipe.c needs -std=c99) 3. Use it! Concurrency Specifications --------------------------- For more details on this list, see: http://www.1024cores.net/home/lock-free-algorithms/queues Type of queue: Multi-producer/Multi-consumer Underlying data structure: Array-based Maximum size: Bounded/Unbounded can be selected at runtime Overflow behavior: Block until there is room Requirement for garbage collection: None Support for priorities: No Ordering guarantees: per-producer FIFO Behavior on empty queue: Block until elements arrive Compatibility -------------- pipe.c has been tested on: * GNU/Linux i686 * GNU/Linux x86-64 * Windows 7 x64 (Cross-compiled from GNU/Linux with mingw64) * Windows 7 (Compiled with mingw) * Windows 7 i686 (Cygwin) * OSX 10.6 However, there's no reason it shouldn't work on any of the following platforms. If you have successfully tested it on any of these, feel free to let me know by dropping me an email <cg.wowus.cg@gmail.com>. * Any platform which supports pthreads (GNU/Linux, BSD, Mac) * Microsoft Windows NT -> Microsoft Windows 7 If you _must_ use Windows, try to only support windows Vista and onwards. By using a recent operating system, you use the more robust built-in condition variables instead of my hacky hand-rolled ones. Supported Compilers: Any compiler with C99 support, however, pipe.c has been tested with... gcc 4.2.1 gcc 4.3.4 (Cygwin) gcc 4.5.2 llvm-gcc 4.2.1 clang 2.8 clang 1.5 (Darwin) icc 12.0 mingw 4.5.2 mingw 4.6.0 Unsupported Compilers: msvc (any version) - No C99 support. The header will work, but the .c itself will never compile on current versions. The headers are supported by any compiler with ANSI C support and don't do anything fancy. Therefore, if you want to use pipe.c with a compiler such as msvc, you can compile the .c file with mingw and the rest of your program with msvc, and link them all together without a hitch. License -------- The MIT License Copyright (c) 2011 Clark Gaebel <cg.wowus.cg@gmail.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Speed Tweaking --------------- There are a few things you can do to get the most speed out of the pipe. 1 Use a vectorizing compiler, and an SSE-enabled memcpy. It's a lot faster, trust me. I recommend icc. 2 Tweak `MUTEX_SPINS` in pipe.c. Set it to a low value if you tend to push large quantities of data at once, and a high value if you tend to push small quantities of data. Feel free to set it to zero to disable spin locking altogether and just use the native lock implementation. 3 Turn on your compiler's link-time optimization. 4 Do profile-guided optimization. 5 Large buffers are preferable to small ones, unless your use case dictates otherwise. Emptying the pipe frequently tends to be faster and use less memory. 6 Avoid limiting the size of the pipe unless you get serious memory issues. 7 If you are always pushing predictable amounts of data into the pipe (such as 1000 elements at a time), you should consider using `pipe_reserve` to keep the pipe from needlessly growing and shrinking. 8 If you can combine a whole bunch of elements into a single struct, it will be faster to push and pop one large struct at a time rather than a push/pop of many small structs. 9 Read through the source code, and find bottlenecks! Don't forget to submit changes upstream, so the rest of the world can be in awe of your speed-hackery. I have also left a couple optimization hints in-source as a reward for actually reading it. API Overview ------------- The API is fully documented in pipe.h Contributing ------------- This is github! Just fork away and send a pull request. I'll take a look as soon as I can. You are also always free to drop me an email at cg.wowus.cg@gmail.com. I won't ignore you. Promise! Things I'd love to see are speed improvements (especially those which reduce lock contention), support for more compilers/platforms, a better makefile (the current one is pitiful), and any algorithmic improvements. Essentially, any patches are welcome which fulfill the goals of pipe: speed, cross-platform-ness, a simple API, and beautiful code. If you have successfully built and tested the pipe on a compiler/architecture not previously mentioned, it would be nice if you could drop me an email with your success/failure story so I can update this README accordingly. Branch information ------------------- Currently, there are two branches of the code. master contains the fastest code. However, it is not necessarily the most readable. It uses two mutexes: one for pushing, and one for popping. This allows for excellent single-writer single-reader performance. It is "done" in terms of the only new code going into it is bugfixes and performance tweaks. If you want to use pipe in your code, you want to checkout this branch. single_mutex is the simplest version, and the easiest to read. It is "done" in terms of the only new code going into it is bugfixes. It implements the queue with a single god-mutex which is simple in practice, but is not the most concurrent design decision. If you want to read over an initial proof-of-concept (and very elegant C), you want to checkout this branch.