- Xilinx Vivado 2022.2
- cmake 3.0 or higher
Supported boards (out of the box)
- Xilinx VC709
- Xilinx VCU118
- Alpha Data ADM-PCIE-7V3
This repository uses git submodules, so do one of the following:
# When cloning:
git clone --recurse-submodules git@url.to/this/repo.git
# Later, if you forgot or when submodules have been updated:
git submodule update --init --recursive
- Create a build directory
mkdir build
cd build
- Configure build
cmake .. -DFNS_PLATFORM=xilinx_u55c_gen3x16_xdma_3_202210_1 -DFNS_DATA_WIDTH=64
All cmake options:
Name | Values | Desription |
---|---|---|
FNS_PLATFORM | xilinx_u55c_gen3x16_xdma_3_202210_1 | Target platform to build |
FNS_DATA_WIDTH | <8,16,32,64> | Data width of the network stack in bytes |
FNS_ROCE_STACK_MAX_QPS | 500 | Maximum number of queue pairs the RoCE stack can support |
FNS_TCP_STACK_MSS | #value | Maximum segment size of the TCP/IP stack |
FNS_TCP_STACK_FAST_RETRANSMIT_EN | <0,1> | Enabling TCP fast retransmit |
FNS_TCP_STACK_NODELAY_EN | <0,1> | Toggles Nagle's Algorithm on/off |
FNS_TCP_STACK_MAX_SESSIONS | #value | Maximum number of sessions the TCP/IP stack can support |
FNS_TCP_STACK_RX_DDR_BYPASS_EN | <0,1> | Enabling DDR bypass on the RX path |
FNS_TCP_STACK_WINDOW_SCALING_EN | <0,1> | Enalbing TCP Window scaling option |
- Build HLS IP cores and install them into IP repository
make ip
For an example project including the TCP/IP stack or the RoCEv2 stack with DMA to host memory checkout our Distributed Accelerator OS DavOS.
- Setup build directory, e.g. for the TCP module
cd hls/toe
mkdir build
cd build
cmake .. -DFNS_PLATFORM=xilinx_u55c_gen3x16_xdma_3_202210_1 -DFNS_DATA_WIDTH=64
- Run
make csim # C-Simulation (csim_design)
make synth # Synthesis (csynth_design)
make cosim # Co-Simulation (cosim_design)
make ip # Export IP (export_design)
All interfaces are using the AXI4-Stream protocol. For AXI4-Streams carrying network/data packets, we use the following definition in HLS:
template <int D>
struct net_axis {
ap_uint<D> data;
ap_uint<D/8> keep;
ap_uint<1> last;
};
To open a connection the destination IP address and TCP port have to provided through the s_axis_open_conn_req
interface. The TCP stack provides an answer to this request through the m_axis_open_conn_rsp
interface which provides the sessionID and a boolean indicating if the connection was openend successfully.
Interface definition in HLS:
struct ipTuple {
ap_uint<32> ip_address;
ap_uint<16> ip_port;
};
struct openStatus {
ap_uint<16> sessionID;
bool success;
};
void toe(...
hls::stream<ipTuple>& openConnReq,
hls::stream<openStatus>& openConnRsp,
...);
To close a connection the sessionID has to be provided to the s_axis_close_conn_req
interface. The TCP/IP stack does not provide a notification upon completion of this request, however it is guranteeed that the connection is closed eventually.
Interface definition in HLS:
hls::stream<ap_uint<16> >& closeConnReq,
To open a port to listen on (e.g. as a server), the port number has to be provided to s_axis_listen_port_req
. The port number has to be in range of active ports: 0 - 32767. The TCP stack will respond through the m_axis_listen_port_rsp
interface indicating if the port was set to the listen state succesfully.
Interface definition in HLS:
hls::stream<ap_uint<16> >& listenPortReq,
hls::stream<bool>& listenPortRsp,
The application using the TCP stack can receive notifications through the m_axis_notification
interface. The notifications either indicate that new data is available or that a connection was closed.
Interface definition in HLS:
struct appNotification {
ap_uint<16> sessionID;
ap_uint<16> length;
ap_uint<32> ipAddress;
ap_uint<16> dstPort;
bool closed;
};
hls::stream<appNotification>& notification,
If data is available on a TCP/IP session, i.e. a notification was received. Then this data can be requested through the s_axis_rx_data_req
interface. The data as well as the sessionID are then received through the m_axis_rx_data_rsp_metadata
and m_axis_rx_data_rsp
interface.
Interface definition in HLS:
struct appReadRequest {
ap_uint<16> sessionID;
ap_uint<16> length;
};
hls::stream<appReadRequest>& rxDataReq,
hls::stream<ap_uint<16> >& rxDataRspMeta,
hls::stream<net_axis<WIDTH> >& rxDataRsp,
Waveform of receiving a (data) notification, requesting data, and receiving the data:
When an application wants to transmit data on a TCP connection, it first has to check if enough buffer space is available. This check/request is done through the s_axis_tx_data_req_metadata
interface. If the response through the m_axis_tx_data_rsp
interface from the TCP stack is positive. The application can send the data through the s_axis_tx_data_req
interface. If the response from the TCP stack is negative the application can retry by sending another request on the s_axis_tx_data_req_metadata
interface.
Interface definition in HLS:
struct appTxMeta {
ap_uint<16> sessionID;
ap_uint<16> length;
};
struct appTxRsp {
ap_uint<16> sessionID;
ap_uint<16> length;
ap_uint<30> remaining_space;
ap_uint<2> error;
};
hls::stream<appTxMeta>& txDataReqMeta,
hls::stream<appTxRsp>& txDataRsp,
hls::stream<net_axis<WIDTH> >& txDataReq,
Waveform of requesting a data transmit and transmitting the data.
Before any RDMA operations can be executed the Queue Pairs have to established out-of-band (e.g. over TCP/IP) by the hosts. The host can the load the QP into the RoCE stack through the s_axis_qp_interface
and s_axis_qp_conn_interface
interface.
Interface definition in HLS:
typedef enum {RESET, INIT, READY_RECV, READY_SEND, SQ_ERROR, ERROR} qpState;
struct qpContext {
qpState newState;
ap_uint<24> qp_num;
ap_uint<24> remote_psn;
ap_uint<24> local_psn;
ap_uint<16> r_key;
ap_uint<48> virtual_address;
};
struct ifConnReq {
ap_uint<16> qpn;
ap_uint<24> remote_qpn;
ap_uint<128> remote_ip_address;
ap_uint<16> remote_udp_port;
};
hls::stream<qpContext>& s_axis_qp_interface,
hls::stream<ifConnReq>& s_axis_qp_conn_interface,
RDMA commands can be issued to RoCE stack through the s_axis_tx_meta
interface. In case the commands transmits data. This data can be either originate from the host memory as specified by the local_vaddr
or can originate from the application on the FPGA. In the latter case the local_vaddr
is set to 0 and the data is provided through the s_axis_tx_data
interface.
Interface definition in HLS:
typedef enum {APP_READ, APP_WRITE, APP_PART, APP_POINTER, APP_READ_CONSISTENT} appOpCode;
struct txMeta {
appOpCode op_code;
ap_uint<24> qpn;
ap_uint<48> local_vaddr;
ap_uint<48> remote_vaddr;
ap_uint<32> length;
};
hls::stream<txMeta>& s_axis_tx_meta,
hls::stream<net_axis<WIDTH> >& s_axis_tx_data,
Waveform of issuing a RDMA read request:
Waveform of issuing an RDMA write request where data on the FPGA is transmitted:
-
D. Sidler, G. Alonso, M. Blott, K. Karras et al., Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware, in FCCM’15, Paper, Slides
-
D. Sidler, Z. Istvan, G. Alonso, Low-Latency TCP/IP Stack for Data Center Applications, in FPL'16, Paper
-
D. Sidler, Z. Wang, M. Chiosa, A. Kulkarni, G. Alonso, StRoM: smart remote memory, in EuroSys'20, Paper
If you use the TCP/IP or RDMA stacks in your project please cite one of the following papers and/or link to the github project:
@inproceedings{DBLP:conf/fccm/SidlerABKVC15,
author = {David Sidler and
Gustavo Alonso and
Michaela Blott and
Kimon Karras and
Kees A. Vissers and
Raymond Carley},
title = {Scalable 10Gbps {TCP/IP} Stack Architecture for Reconfigurable Hardware},
booktitle = {23rd {IEEE} Annual International Symposium on Field-Programmable Custom
Computing Machines, {FCCM} 2015, Vancouver, BC, Canada, May 2-6, 2015},
pages = {36--43},
publisher = {{IEEE} Computer Society},
year = {2015},
doi = {10.1109/FCCM.2015.12}
@inproceedings{DBLP:conf/fpl/SidlerIA16,
author = {David Sidler and
Zsolt Istv{\'{a}}n and
Gustavo Alonso},
title = {Low-latency {TCP/IP} stack for data center applications},
booktitle = {26th International Conference on Field Programmable Logic and Applications,
{FPL} 2016, Lausanne, Switzerland, August 29 - September 2, 2016},
pages = {1--4},
publisher = {{IEEE}},
year = {2016},
doi = {10.1109/FPL.2016.7577319}
}
@inproceedings{DBLP:conf/eurosys/SidlerWCKA20,
author = {David Sidler and
Zeke Wang and
Monica Chiosa and
Amit Kulkarni and
Gustavo Alonso},
title = {StRoM: smart remote memory},
booktitle = {EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece,
April 27-30, 2020},
pages = {29:1--29:16},
publisher = {{ACM}},
year = {2020},
doi = {10.1145/3342195.3387519}
}
@PHDTHESIS{sidler2019innetworkdataprocessing,
author = {Sidler, David},
publisher = {ETH Zurich},
year = {2019-09},
copyright = {In Copyright - Non-Commercial Use Permitted},
title = {In-Network Data Processing using FPGAs},
}
@INPROCEEDINGS{sidler2020strom,
author = {Sidler, David and Wang, Zeke and Chiosa, Monica and Kulkarni, Amit and Alonso, Gustavo},
booktitle = {Proceedings of the Fifteenth European Conference on Computer Systems},
title = {StRoM: Smart Remote Memory},
doi = {10.1145/3342195.3387519},
}
- David Sidler, Systems Group, ETH Zurich
- Monica Chiosa, Systems Group, ETH Zurich
- Fabio Maschi, Systems Group, ETH Zurich
- Zhenhao He, Systems Group, ETH Zurich
- Mario Ruiz, HPCN Group of UAM, Spain
- Kimon Karras, former Researcher at Xilinx Research, Dublin
- Lisa Liu, Xilinx Research, Dublin