/krping

Kernel Mode RDMA Ping

Primary LanguageC

		Kernel Mode RDMA Ping Module
		   Steve Wise - 8/2009

---
Updated 8/2016
---

============
Introduction
============

The krping module is a kernel loadable module that utilizes the Open
Fabrics verbs to implement a client/server ping/pong program.  The module
was implemented as a test vehicle for working with the iwarp branch of
the OFA project.

The goals of this program include:

- Simple harness to test kernel-mode verbs: connection setup, send,
recv, rdma read, rdma write, and completion notifications.

- Client/server model.

- IP addressing used to identify remote peer.

- Transport independent utilizing the RDMA CMA service

- No user-space application needed.

- Just a test utility...nothing more.

This module allows establishing connections and running ping/pong tests
via a /proc entry called /proc/krping.  This simple mechanism allows
starting many kernel threads concurrently and avoids the need for a user
space application.

The krping module is designed to utilize all the major DTO operations:
send, recv, rdma read, and rdma write.  Its goal was to test the API
and as such is not necessarily an efficient test.  Once the connection
is established, the client and server begin a ping/pong loop:

Client				Server
---------------------------------------------------------------------
SEND(ping source buffer rkey/addr/len)

				RECV Completion with ping source info
				RDMA READ from client source MR
				RDMA Read completion
				SEND .go ahead. to client

RECV Completion of .go ahead.
SEND (ping sink buffer rkey/addr/len)	

				RECV Completion with ping sink info
				RDMA Write to client sink MR
				RDMA Write completion
				SEND .go ahead. to client

RECV Completion of .go ahead.
Validate data in source and sink buffers

<repeat the above loop>


============
To build/install the krping module
============

# git clone git://git.openfabrics.org/~swise/krping
# cd krping
<edit Makefile and set KSRC accordingly>
# make && make install
# modprobe rdma_krping

============
Using Krping
============

Communication from user space is done via the /proc filesystem.
Krping exports file /proc/krping.  Writing commands in ascii format to
/proc/krping will start krping threads in the kernel.  The thread issuing
the write to /proc/krping is used to run the krping test, so it will
block until the test completes, or until the user interrupts the write.

Here is a simple example to start an rping test using the rdma_krping
module.  The server's address is 192.168.69.127.  The client will
connect to this address at port 9999 and issue 100 ping/pong messages.
This example assumes you have two systems connected via IB and the
IPoverIB devices are configured on the 192.168.69/24 subnet accordingly.

Server:

# modprobe rdma_krping
# echo "server,addr=192.168.69.127,port=9999" >/proc/krping


The echo command above will block until the krping test completes,
or the user hits ctrl-c.

On the client:

# modprobe rdma_krping
# echo "client,addr=192.168.69.127,port=9999,count=100" >/proc/krping

Just like on the server, the echo command above will block until the
krping test completes, or the user hits ctrl-c.

The syntax for krping commands is a string of options separated by commas.
Options can be single keywords, or in the form: option=operand.

Operands can be integers or strings.

Note you must specify the _same_ options on both sides.  For instance,
if you want to use the server_invalidate option, then you must specify
it on both the server and client command lines.

Opcode		Operand Type	Description
------------------------------------------------------------------------
client		none		Initiate a client side krping thread.
server		none		Initiate a server side krping thread.
addr		string		The server's IP address in dotted 
				decimal format.  Note the server can
				use 0.0.0.0 to bind to all devices.
port		integer		The server's port number in host byte 
				order.
count		integer		The number of rping iterations to 
				perform before shutting down the test.  
				If unspecified, the count is infinite.
size		integer		The size of the rping data.  Default for 
				rping is 65 bytes.
verbose		none		Enables printk()s that dump the rping 
				data. Use with caution!
validate	none		Enables validating the rping data on
				each iteration to detect data 
				corruption.
mem_mode	string		Determines how memory will be 
				registered.  Modes include dma,
				and reg.  Default is dma.
server_inv 	none		Valid only in reg mr mode, this 
				option enables invalidating the
				client's reg mr via 
				SEND_WITH_INVALIDATE messages from
				the server.
local_dma_lkey	none		Use the local dma lkey for the source 
				of writes and sends, and in recvs
read_inv	none		Server will use READ_WITH_INV. Only
				valid in reg mem_mode.
				
============
Memory Usage:
============

The krping client uses 4 memory areas:

start_buf - the source of the ping data.  This buffer is advertised to
the server at the start of each iteration, and the server rdma reads
the ping data from this buffer over the wire.

rdma_buf  - the sink of the ping data.  This buffer is advertised to the
server each iteration, and the server rdma writes the ping data that it
read from the start buffer into this buffer.  The start_buf and rdma_buf
contents are then compared if the krping validate option is specified.

recv_buf  - used to recv "go ahead" SEND from the server.  

send_buf  - used to advertise the rdma buffers to the server via SEND
messages.

The krping server uses 3 memory areas:

rdma_buf  - used as the sink of the RDMA READ to pull the ping data
from the client, and then used as the source of an RDMA WRITE to
push the ping data back to the client.

recv_buf  - used to receive rdma rkey/addr/length advertisements from
the client.

send_buf  - used to send "go ahead" SEND messages to the client.


============
Memory Registration Modes:
============

Each of these memory areas are registered with the RDMA device using
whatever memory mode was specified in the command line. The mem_mode
values include: dma, and reg (aka fastreg).  The default mode, if not
specified, is dma.

The dma mem_mode uses a single dma_mr for all memory buffers.

The reg mem_mode uses a reg mr on the client side for the
start_buf and rdma_buf buffers.  Each time the client will advertise
one of these buffers, it invalidates the previous registration and fast
registers the new buffer with a new key.   If the server_invalidate
option is on, then the server will do the invalidation via the "go ahead"
messages using the IB_WR_SEND_WITH_INV opcode.   Otherwise the client
invalidates the mr using the IB_WR_LOCAL_INV work request.

On the server side, reg mem_mode causes the server to use the
reg_mr rkey for its rdma_buf buffer IO.  Before each rdma read and
rdma write, the server will post an IB_WR_LOCAL_INV + IB_WR_REG_MR
WR chain to register the buffer with a new key.  If the krping read-inv
option is set then the server will use IB_WR_READ_WITH_INV to do the
rdma read and skip the IB_WR_LOCAL_INV wr before re-registering the
buffer for the subsequent rdma write operation.

============
Stats
============

While krping threads are executing, you can obtain statistics on the
thread by reading from the /proc/krping file.  If you cat /proc/krping,
you will dump IO statistics for each running krping thread.  The format
is one thread per line, and each thread contains the following stats
separated by white spaces:

Statistic		Description
---------------------------------------------------------------------
Name			krping thread number and device being used.
Send Bytes		Number of bytes transferred in SEND WRs.
Send Messages		Number of SEND WRs posted
Recv Bytes		Number of bytes received via RECV completions.
Recv Messages		Number of RECV WRs completed.
RDMA WRITE Bytes	Number of bytes transferred in RDMA WRITE WRs.
RDMA WRITE Messages	Number of RDMA WRITE WRs posted.
RDMA READ Bytes		Number of bytes transferred via RDMA READ WRs.
RDMA READ Messages	Number of RDMA READ WRs posted.

Here is an example of the server side output for 5 krping threads:

# cat /proc/krping
1-amso0 0 0 16 1 12583960576 192016 0 0
2-mthca0 0 0 16 1 60108570624 917184 0 0
3-mthca0 0 0 16 1 59106131968 901888 0 0
4-mthca1 0 0 16 1 101658394624 1551184 0 0
5-mthca1 0 0 16 1 100201922560 1528960 0 0
#

============
EXPERIMENTAL
============

There are other options that enable micro benchmarks to measure
the kernel rdma performance.  These include:

Opcode		Operand Type	Description
------------------------------------------------------------------------
wlat		none		Write latency test
rlat		none		read latency test
poll		none		enable polling vs blocking for rlat
bw		none		write throughput test
duplex		none		valid only with bw, this
				enables bidirectional mode
tx-depth	none		set the sq depth for bw tests


See the awkit* files to take the data logged in the kernel log
and compute RTT/2 or Gbps results.

Use these at your own risk.


END-OF-FILE