Turbopump - An aspiring low-latency, extensible, distributed key value store
Distributed key value stores are tremendously flexible tools for building a wide variety of network applications. Mesh networks, file synchronization, email, distributed computing, web servers, etc -- anything that requires a predictable delegation of data distribution and / or processing seems well suited for the eventually consistent, hash-partitioned topology that underlies technologies like Amazon's Dynamo, or Riak.
However, specific implementations of distributed key value stores usually restrict themselves to one particular use case. This project is an experiment. Its purpose is to see if a single, small code base can provide high performance building blocks for many, seemingly unrelated kinds of distributed applications.
Written in C++11. Currently in beta: some stuff isn't implemented, and the rest probably isn't trustworthy yet.
- System dependencies
- Linux
- clang >= 3.4 or gcc >= 4.9 (need std::regex, among other C++11 features)
- cmake
- Library dependencies [apt-get install ...]
- libattr
- libboostfilesystem
- libmsgpack
- libudt
- git clone https://github.com/sz3/turbolib
-
turbolib should be located adjacent to the turbopump code in the directory tree. That is:
- /home/user/code/turbopump
- /home/user/code/turbolib
... obviously, you could also modify CMakeLists.txt to do something else.
- Build commands
From within the "turbopump" directory, run:
cmake .
make
make install
To run tests:
ctest
./turbopumpd
./turbopumpd -h
...for some help.
The built "turbopumpd" executable has a few toggles for modes of operation:
-
UDP or UDT for transit. The default is UDT.
- UDP mode does no flow control, cannot guarantee packet ordering, etc. In short, it is vanilla UDP. You can use it to saturate your local network. Or for testing purposes. If your key name + contents that will always be smaller than a single UDP frame (1472 bytes), and you can keep a burst of packets from melting down your switches / routers, this may be semi-viable.
- UDT is slightly slower, but does flow control, packet ordering, etc. UDT is the default. It is a user-space (D)ata (T)ransfer protocol implemented atop UDP. In turbopump, it enables the transfer of large contents in a way that is reliable and non-destructive to the network.
-
Partition vs Clone
- "Partition" mode.
The default mode of operation is to partition nodes via consistent hashing.
That is, each write in the system sets a value
mirrors
for a key, and onlymirrors
machines will receive a copy of the data. - "Clone" mode.
There is a special value of
mirrors
that tells the system to distribute a key to all machines: 0. "Clone" mode is a special flag that says to treat all values ofmirrors
like this. Each member of the turbopump cluster will mirror all key / value pairs in this mode.
- "Partition" mode.
The default mode of operation is to partition nodes via consistent hashing.
That is, each write in the system sets a value
-
File vs RAM vs ...
- File.
The current default is to store values as files on the filesystem.
However, turbopump does not currently deal well with high latency disk
drives -- its thread scheduling is naive -- so this is best considered a
beta feature. Right now, prefered operation is to use RAM, by way of ramfs.
e.g. on debian-based Linux distros, you might set your data directory to
/run/shm/turbopump
. SSD performance is best described as "survivable". - RAM. A work in progress.
- ...
The storage interface in turbopump is meant to be generic. That is, whether
sophia or sqlite, or another local database solution, the desire is
that turbopump should only need a thin wrapper to use a local database for
its local storage.
There are two notable requirements for these wrapper implementations:
- We expect to store multiple versions of a key+value simultaneously.
- We expect to store some metadata for each key+value.
- File.
The current default is to store values as files on the filesystem.
However, turbopump does not currently deal well with high latency disk
drives -- its thread scheduling is naive -- so this is best considered a
beta feature. Right now, prefered operation is to use RAM, by way of ramfs.
e.g. on debian-based Linux distros, you might set your data directory to
-
Membership
- Cluster membership is currently initialized through a flat file. The first line in the file is the local turbopump's identity. Each successive line corresponds to a peer. Membership is symmetric -- for a machine to join the cluster, it needs to add a member of the cluster to its membership, and the cluster member in question needs to add the new machine to its own membership.
- While that is admittedly a bit clunky, the rest will happen auto-magically. Members will readily share their member lists with other members, so the new recruit will quickly be recognized by the entire group.
You can use HTTP(1) over TCP or unix domain sockets. By default, turbopump runs a TCP server on port 1592. It is not exactly a super robust server, so please don't put it on the public internet (by default it binds to INADDR_LOOPBACK). Also, the API is not very mature and guaranteed to change. :|
With all that said!
> curl localhost:1592/status
> echo "bytes" | curl -d @- localhost:1592/write?name=foo
> curl localhost:1592/read?name=foo
> curl localhost:1592/list-keys
> curl localhost:1592/membership
To try out the domain socket functionality (./turbopump -l /tmp/turbopump
),
you can use netcat:
> echo -e -n 'GET /status HTTP/1.1\r\n\r\n' | nc -U /tmp/turbopump
- reliability...
- get RAM storage operational again.
- HTTP/2
- encryption of socket layer via libsodium.
- signed file writes. Only accept data changes if write is authenticated. This is too complicated to describe in a todo. It will be cool though.
- a "transient" mode, where keys are only kept until they have been successfully propagated to the destination.
- dynamic membership -- authenticated (signed) peer removal.
- libnice NAT traversal. Maybe. UDT is in the way.
- directory indexing. Gated on a concept I call "collections".
- windows support...?
- versioned (vector clock'd) writes
- key distribution/partitioning
- cross-peer key sync
- function callback hooks on write completion
- deleted key cleanup (age-out of deleted metadata)
- UDP, UDT internal communication
- simple domain socket or TCP (HTTP) command server
- basic load/latency testing
- concurrent writes
- large writes
- cross-peer sync
These are some of the things distributed key-value stores do.