Virtual Private Networks (VPNs) are the central and indispensable component of Internet security. They comprise a set of technologies that connect geographically dispersed, heterogeneous networks through encrypted tunnels, creating the impression of a homogenous private network on the public shared physical medium.
With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam, Wireguard is increasingly coming to the forefront as a modern, secure data tunneling and encryption method, one that's also easier to manage than the incumbents. Both software and hardware implementations of Wireguard already exist. However, the software performance is far below the speed of wire. Existing hardware approaches are both prohibitively expensive and based on proprietary, closed-source IP blocks and tools.The intent of this project is to bridge these gaps with an FPGA open-source implementation of Wireguard, written in SystemVerilog HDL.
We have contributed to Blackwire project, which is a 100Gbps hardware implementation of Wireguard switch based on AMD/Xilinx-proprietary AlveoU50 PC-accelerator card (SmartNIC form-factor), and implementable only with proprietary Vivado toolchain.
While working on the Blackwire, we have touched multiple sections, and focused on the novel algorithm for Balanced Binary Tree Search of IP tables. However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, which is a niche HDL, hardly popular in either academia or industry. While Blackwire is now released to open-source, that decision came from financial hardship -- It was originaly meant for sale. Moreover, the company behind it has potential lawsuit issues that bring into question the legality of ownership over the codebase they donated to the open source community.
To make the hardware Wireguard truly accessible in the genuine spirit of open-source movement, this project implements it:
- for an inexpensive hardware platform with four 1000Base-T ports
- in a self-sufficient way, i.e. w/o requiring PC host
- using a commodity Artix7 FPGA
- which is supported by open-source tools
- and with all gateware written in the ubiquitous Verilog / System Verilog
[Ref1] Wireguard implementations in software:
[Ref2] 100Gbps Blackwire Wireguard
[Ref3] Corundum, open-source FPGA-NIC platform
[Ref4] ChaCha20-Poly1305 open-source Crypto RTL
[Ref5] Cookie Cutter SOC
[Ref6] RISC-V ISS
[Ref7] 10Gbps Ethernet Switch
[Ref8] OpenXC7 open-source tools for Xilinx Series7
The Phase1 (This!) is primarily Proof of Concept, i.e. not full-featured, and definitely not a deployable product. It is envisoned as a mere on-ramp, a springboard for future build-up and optimizations.
The Phase2 continuation project is therefore also in the plans, to maximize efficiency and overall useability, such as by increasing the number of channels, facilitating management with GUI apps, or something else as identified by the community feedback.
- HW/SW partitioning, interface, interactions and workload distribution
- While, contrary to Blackwire, we don’t rely on an external PC connected via PCIE, we will still have an on-chip RISC-V CPU with intricate hardware interface and significant Embedded Software component that controls the backbone of wire-speed datapath
- HW/SW co-development, integration and debugging
- Standard simulation is impractical for the project of this size and complexity. We therefore intend to put to test and good use the very promissing new VProc ISS [Ref6]
- It’s also impractical and expensive to provide full test systems with real traffic generators and checkers to all developers. We therefore plan to rent some space for a central lab that will host two test systems, then provide remote access to all developers
- Real-life, at-speed testing
- Extent of open-source tools support for SystemVerilog and all needed FPGA primitives and IP functions
- QOR of the (still maturing) open-source tools
- Blackwire used commercial, AMD/Xilinx-proprietary Vivado toolchain, as well as high-end Alveo U50 FPGA silicon. Even then, they ran into multiple timing closure, utilization and routing congestion challenges.
- Financial resources
- Given that this is a complex, multi-disciplinary dev effort, the available funding may not be sufficient to bring it to completion. Blackwire, despite a larger allocated budget, ended up with funding crisis and abrupt cessation of dev activities.
This project is WIP at the moment. The checkmarks below indicate our status. Until all checkmarks are in place, anything you get from here is w/o guaranty -- Use at own risk, as you see fit, and don't blame us if it is not working 🌤️
Board bring up. In-depth review of Wireguard ecosystem and prior art. Design Blueprint
While the board we're using is low cost, it is also not particularly known in the open-source community. We certainly don’t have prior experience with it. In this opening take we will build a solid foundation for efficient project execution. Good preparation is crucial for a smooth run. We thus seek to first understand and document what we will be designing: SOC Architecture, Datapath Microarchitecture, Hardware/Software Partitioning, DV and Validation Strategy
.
Getting a good feel for our Fmax is also a goal of this take. Artix-7 does not support High-Performance (HP) I/O. Consequently, we cannot push its I/O beyond 600MHz, nor its core logic beyond 100 MHz.
-
Familiarization with HW platform
-
Create our first FPGA program that blinks LEDs
-
Verify pinouts and connectivity using simple test routines
-
Generate a few Ethernet test patterns
-
Familiarization with SW platform
-
Initial bring up of embedded CPU within a cookie-cutter SOC, such as [Ref5]
-
Design and test a simple SW interface to rudimentary HW Ethernet datapath
-
Detailed analysis and comparisons of:
-
Wireguard White Papers
-
existing implementations in software [Ref1]
-
vs. Blackwire hardware implementation [Ref2]
-
cryptographic algorithms used for Wireguard, esp. ChaCha20 for encryption, Poly1305 for authentication [Ref4] and, to a lesser extent, Curve25519 for key exchange and blake2 for hashing
-
Identification and assimilation of prior art and building IP blocks, in particular Corundum [Ref3] and, to a lesser extent, 10GE Switch [Ref7]
-
Architecture/uArch Design. HW/SW Partitioning. Verification Plan
-
Creation of sufficient initial documentation for project divide-and-conquer across a multi-disciplinary team of half a dozen developers
Implementation of a basic, statically pre-configured Wireguard link
It it in this take that we start creating hardware Datapath and hardening Wireguard encryption protocols, all using Vivado and Xilinx primitives.
- Integration of collected RTL blocks into a coherent HW system that implements the basic Wireguard datapath for a handful of manually pre-configured channels.
- Corundum FPGA-based NIC and platform for opensource Ethernet development [Ref3]
- IP Core for ChaCha20-Poly1305 [Ref4] -- Definitely in hardware from the get-go
- Curve25519 module for key exchange -- Likely in software at this point
- blake2 module for hashing -- Likely in software at this point
- Timing closure. Resolution of FPGA device utilization and routing congestion issues
- Creation of cocoTB DV in the CI/CD environmenT, and representative test cases for datapath simulation
Development and integration of embedded management software (Control Plane)
This work package is about hardware/software codesign and integration. The firmware will run on a soft RISC V processor, inside the FPGA. Our vanilla SOC is at this point starting to be customized to Wireguard needs. This work can to some extent go on in parallel with hardware activities of Take2.
-
SW design for the embedded on-chip processor
-
Code is to be written in the bare-metal C with, as necessary, few sections in Assembly
-
SW is responsible for configuration and management of hardware blocks
-
SW must not participate in the bulk datapath transfers
-
SW may however intercept the low-frequency management packets
-
For quick bring up, the IP Address Search (using novel balanced binary tree algorithm) may initially be hosted in software, then ported to hardware at a later point
-
SW will also cover KMM function -- Key Management Module
-
HW/SW Integration
VPN Tunnel: Session initialization, maintenance, and secure closure
This is about managing the bring-up, maintenance and tear-down of VPN tunnels between two devices.
- Session Initialization: Starting the handshake process to establish secure communication with another device
- Session Maintenance: Keeping the session active through the regular exchange of control messages, which allows detection and recovery from problems such as connection interruptions
- Session Closure: Securely close the VPN tunnel when communication is no longer needed, ensuring that all temporary keys and sensitive data are deleted
Testing, Profiling and Porting to OpenXC7
-
Functional testing on the real system. Does it work as intended? Bug fixes
-
Performance testing. HW/SW profiling, updates and enhancements to ensure the design indeed operates at close to the wire speed on all preconfigured channels
-
Porting to openXC7 [Ref8] using SV2V, in the GoCD CI/CD setting
This is challenging, as openXC7 has thus far been crashing for NES SV
- Timing closure with openXC7
This is definitely challenging, given that openXC7 is currently without accurate timing-driven STA
-
Filing bug tickets with open source developers for issues found in their tools, supporting them all the way to the resolution
-
Creation and maintenance of an attractive and well-documented Github repo, to entice community interest
-
Ongoing documentation updates and CI/CD script maintenance to keep it valid in the light of inevitable design mutations compared to the original Design Blueprint.
Flow control module for efficient and stable VPN tunnel data management
The objective of this optional deliverable is to ensure stable and efficient links, thus taking this project one step closer to a deployable product.
- Develop software components for management of data flow within VPN tunnels
- PHY and MAC IP blocks
- Parsing and Extraction of Ethernet Packet Header
- Ethernet inbound Denial of Service (DoS) filter (with blake2)
- WireGuard offload engines for cryptography and key authentication, see next section for more
- UDP packet disassembler and assembler
- blake2 hashing for MAC authentication and keyed hashing, per RFC7693
- ChaCha20 for symmetric encryption, authenticated with Poly1305, using RFC7539's AEAD (Authenticated Encryption Authenticated Data) construction
- Curve25519 for ECDH key updates (for now just hooks for future)
- SipHash24 for hashtable keys
- HKDF for key derivation
WIP
WIP
WIP
WIP
WIP
WIP
TODO
TODO
We are grateful to NLnet Foundation for their sponsorship of this development activity.