An implementation of the AllConcur algorithm over both TCP/IP and InfiniBand Verbs [1].
- libev (version >= 4.15)
- gmplib (version >= 6.1.0)
- mpfrlib (version >= 3.1.3)
./configure --help
usage : $0 [options]
options: [--prefix=DIR] # Installation directory
[--with-ev=DIR] # ev installation directory
[--with-mpfr=DIR] # mpfr installation directory
[--with-gmp=DIR] # gmp installation directory
[--enable-ibv] # enable IBV network module
[--with-ibv=DIR] # IB verbs installation directory
[--with-rdmacm=DIR] # RDMA-CM installation directory
make benchmark
$ ./bin/allconcur -h
Usage: ./bin/allconcur [OPTIONS]
OPTIONS:
-c <config-file> configuration file
[--tcp | --ibv] server-interconnection type (dafault=ibv)
[-n <size>] number of nodes (default=6)
[-r <k-nines>] required reliability (default=6)
[--join] join group a servers
-s <hostname> [join] server's hostname
-p <port> [join] server's port
[--timer] get runtime estimates (default no)
[--latency] run latency benchmark
[--throughput] run peak throughput benchmark
[--failure] run failure benchmark
By default, each AllConcur server periodically generates requests; check the config file (e.g., allconcur.cfg) for the request size (i.e., req_size) and the request generating period (i.e., req_period). Other benchmarks:
- latency -- only one server A-broadcast a single request;
- throughput -- each server batches more requests in a single message, which is then A-broadcast; max_batching (see config file) gives the maximum batching factor;
- failure -- fault-tolerant agreement.
The benchmarks can be started also through the following MPI launcher (this automatically creates the configuration file):
$ ./bin/allconcur_launcher -h
Usage: ./bin/allconcur_launcher [OPTIONS]
OPTIONS:
-n <size> number of nodes
[--tcp | --ibv] server-interconnection type (default=ibv)
[-r <k-nines>] required reliability (default=6)
[-p <port>] inter-server port (default=55000)
[-s <size>] request size in bytes (default=64)
[--timer] timer
[--treq] run request rate benchmark
[--ctreq] run constant request rate benchmark
[-P <req_period>] [treq|ctreq] period of generating request in ms (default=1)
[--latency] run latency benchmark
[--throughput] run peak throughput benchmark (default)
[-b <batching factor>] [throughput] max batching factor (default=512)
[--failure] run failure benchmark
e.g., the following command starts the throughput benchmark
mpirun -np 8 ./bin/allconcur_launcher -n 8 -s 1024 --throughput -b 512
- ./bin -- binaries
- ./lib -- libraries
- ./src -- source files
- allconcur.c -- main function
- ac_init.c -- initialization
- ac_algorithm.c -- the AllConcur algorithm
- ac_fd.c -- failure detector
- messages.c -- message queue (for pending and premature messages)
- request_pool.c -- request pool (circular buffer)
- g_array.c -- tracking digraphs
- kvs_ht.c -- key-value store (hash-table)
- ctrl_sm.c -- control state machine (for consistency of control data)
- client_listener.c -- listening for incoming client requests
- ./include -- header files
- ./loggp -- code for computing LogGP params (based on Netgauge [2])
- ./net -- network module
- ibv -- InfiniBand verbs
- tcp -- TCP sockets
- ./test -- scripts
- ./utils
- ft-digraph -- fault tolerant digraph library
- rbtree -- red-black Trees implementation (Linux)
- ./tla -- TLA+[3] specifcation and proofs
Avoid 200ms delay on TCP ([reduce TCP RTO on Linux] (http://unix.stackexchange.com/questions/210367/changing-the-tcp-rto-value-in-linux))
By default, the min RTO on Linux is ~200ms. To see the RTO values for open TCP connections use the following command:
ss -i
The RTO can be changed only per route; e.g.,
$ ip route
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1002
default via 10.0.2.2 dev eth0
$ ip route change default via 10.0.2.2 dev eth0 rto_min 5ms
$ ip route
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1002
default via 10.0.2.2 dev eth0 rto_min lock 5ms
[1] M. Poke, T. Hoefler, and C. W. Glass: Allconcur: Leaderless concurrent atomic broadcast (extended version), CoRR, abs/1608.05866, 2016.
[2] T. Hoefler, A. Lichei and W. Rehm: Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks, IEEE IPDPS 2007
[3] L. Lamport: Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002