/wb2axip

Bus bridges and other odds and ends

Primary LanguageVerilog

WB2AXIP: Bus interconnects, bridges, and other components

The bus components and bridges within this repository are unique in that they are all designed for 100% throughput with no throughput overhead. They are also unique in that the vast majority of the cores within have all been formally verified.

Where the protocol allows it, such as with AXI4, AXI-lite, and Wishbone B4 pipelined, multiple transactions may be in flight at a time so that protocol handling doesn't stall the bus.

This is uncommon among AXI4 implementations and almost unheard of in the AXI-lite implementations I have examined.

Most AXI4 implementations will process a single burst transaction packet at a time and require some overhead to make this happen. Xilinx's AXI-lite implementations, both interconnect and slave implementations, only handle one request at a time. Other buses, such as Wishbone Classic, AHB, or APB, will only ever process one transaction word at a time.

If you are coming from AXI4, AXI-lite, or any one of these other bus implementations to the AXI4 or even AXI-lite components supported here, then you should expect to see a throughput increase by using one (or more) of these cores--given of course that you have a bus master capable of issuing multiple requests at a time.

This performance improvement may be as significant as a 16x speedup when toggling an I/O, a 4x speedup in when comparing this slave against Xilinx's block RAM memory controller (when processing single beat transactions), or as insignificant as 2% improvement from using the AXI4 MM to Slave converters (according to Xilinx's data sheets---I haven't yet run the test myself). This increased performance extends to the crossbar implementations contained within this repository as well, and so you may notice the improvement only increases when using these crossbars.

A Pipelined Wishbone B4 to AXI4 bridge

Built out of necessity, this repository was originally built around a Wishbone (WB) to AXI4 bridge, which is designed to provide a conversion from a pipelined wishbone bus to an AXI4 bus for the purposes of driving memory transactions through Xilinx SDRAM controllers. The WB->AXI bridge is designed to connect a wishbone bus to an AXI bus which may be wider--such as from a 32-bit WB bus to a 128-bit AXI bus. Hence, if the Memory Interface Generator DDR3 controller is running at a 4:1 clock rate, memory clocks to AXI system clocks, then this bus translator should be able to accomplish one transaction per clock at a sustained or pipelined rate.

Since the initial build of the core, I've added the WB to AXI lite bridge. This is also a pipelined bridge, and like the original one it has also been formally verified.

AXI to Wishbone conversion

While not the original purpose of the project, it now has both AXI-lite to WB and AXI to WB bridges. Each of these bridges comes in two parts, a read and write half. These halves can be used either independently , generating separate inputs to a WB crossbar, or combined through a WB arbiter.

The AXI-lite to WB bridge has been both formally verified and FPGA proven. This includes both the write half as well as the read half. Given the reluctance of the major vendors to support high speed AXI-lite interfaces, you aren't likely to find this kind of performance elsewhere.

The AXI to WB bridge write and read components have only been formally verified through about a dozen steps or so. This proof is deep enough to verify most of the bus interactions, but not nearly deep enough to verify any issues associated with internal FIFO overflows.

Wishbone pipeline to WB Classic

There's also a Wishbone (pipelined, master) to Wishbone (classic, slave) bridge, as well as the reverse Wishbone (classic, master) to Wishbone (pipelined, slave) bridge. Both of these have passed their formal tests. They are accompanied by a set of formal properties for Wishbone classic, both for slaves as well as masters.

Formal Verification

Currently, the project contains formal specifications for Avalon, Wishbone (classic), Wishbone (pipelined), and AXI-lite buses. There's also a formal property specification for an AXI (full) bus, but the one in the master branch is incomplete. The complete set of AXI properties are maintained elsewhere.

Xilinx Cores

The formal properties were first tested on a pair of Xilinx AXI demonstration cores. These cores failed formal verification. You can read about them on my blog, at zipcpu.com, here for AXI-lite and here for AXI. You can find the Xilinx cores referenced in those articles here and here for reference, for those who wish to repeat or examine my proofs.

Cross-bars and AXI demonstrators

This repository has since become a repository for all kinds of bus-based odds and ends in addition to the bus translators mentioned above. Some of these odds and ends include crossbar switches and AXI demonstrator cores. As mentioned above, these cores are unique in their 100% throughput capabilities.

  • WBXBAR is a fully function N master to M slave Wishbone crossbar. Unlike my typical WB interconnects, this one guarantees that the ACK responses won't get crossed, and that misbehaving slave accesses will be timed out. The core also has options for checking for starvation (where a master's request is not granted in a particular period of time), double-buffering all outputs (i.e. with skid buffers), and forcing idle channel values to zero in order to reduce power.

    This core has been formally verified.

  • AXILXBAR is a fully functional, formally verified, N master to M slave AXI-lite crossbar interconnect. As such, it permits min(N,M) active channel connections between masters and slaves all at once. This core also has options for low power, whereby unused outputs are forced to zero, and lingering. Since the AXI protocol doesn't specify exactly when to close a channel, there's an OPT_LINGER allowing you to specify how many cycles the channel should be idle for in order for the channel to be closed. If the channel is not closed, a clock can be spared when reusing it. Otherwise, two clocks will be required to access a given channel.

    This core has been formally verified.

    While I haven't tested Xilinx's interconnect to know, if the quality of their demonstration AXI-lite slave core is any indication, then this cross-bar should easily outperform anything they have. The key unusual feature? The ability to maintain one transaction per clock over an extended period of time across any channel pair.

  • AXIL2AXIS converts from AXI-lite to AXI stream and back again. It's primary purpose is for testing AXI stream components at low speed, to make certain that they work before increasing the speed of the stream to the system clock rate. As such, writes to the core will generate writes to the AXI stream on the master side, and reads from the core will accept AXI stream reads on the slave side.

    While this isn't really intended to be a high performance core, it can still handle 100% throughput like most of my IP here. Therefore, anything less than 100% throughput through this core will be a test of and reflection of how the rest of your system works.

    This core has been formally verified.

  • AXILSINGLE is designed to be a companion to AutoFPGA's AXI-lite support. It's purpose is to simplify connectivity logic when supporting multiple AXI-lite registers. This core takes a generic AXI-lite interface, and simplifies the interface so that multiple single-register cores can be connected to it. The single-register cores can either be full AXI-lite cores in their own respect, subject to simplification rules, or even simplified from that. They must never stall the bus, and must always return responses within one clock cycle. The AXILSINGLE handles all backpressure issues. If done right, the backpressure logic from the slave core will be removed by the synthesis tool, allowing all backpressure logic to be condensed into a few shared wires.

    This core has been formally verified.

  • AXILDOUBLE is the second AXI-lite companion to AutoFPGA's AXI-lite support. It's purpose is to simplify connectivity logic when supporting multiple AXI-lite slaves. This core takes a generic AXI-lite interface, and simplifies the interface so that peripherals can be connected to it. These peripherals cores can either be full AXI-lite cores in their own respect, subject to simplification rules discussed within, or even simplified from that. They must never stall the bus, and must always return responses within one clock cycle. The AXILDOUBLE core handles all backpressure issues, address selection, and invalid address returns.

    This core has been formally verified.

  • AXIXBAR is a fun project to develop a full NxM configurable cross bar using the full AXI protocol.

    Unique to this (full) AXI core is the ability to have multiple ongoing transactions on each of the master-to-slave channels. Were Xilinx's crossbar to do this, it would've broken their demonstration AXI-full slave core.

    This core has been formally verified.

  • DEMOAXI is a demonstration AXI-lite slave core with more power and capability than Xilinx's demonstration AXI-lite slave core. Particular differences include 1) this one passes a formal verification check (Xilinx's core has bugs), and 2) this one can handle a maximum throughput of one transaction per clock. (Theirs did at best one transaction every other clock period.) You can read more about this demonstration AXI-lite slave core on ZipCPU.com in this article.

    This core has been formally verified.

  • EASYAXIL is a second demonstration AXI-lite slave core, only this time re-engineered to look and feel simpler than the DEMOAXI core above. It's also designed to use internal registers, vice a memory, so that it can be more easily extended. The core can either use skidbuffers, in which case its performance matches the DEMOAXI core above, or not, in which case it has only half the throughput. The real key difference is that the skid buffers have been removed into an external module.

    This core has been formally verified.

  • DEMOFULL is a fully capable AXI4 demonstration slave core, rather than just the AXI-lite protocol. Well, okay, it doesn't do anything with the PROT, QOS, CACHE, and LOCK flags, so perhaps it isn't truly the full AXI protocol. Still, it's sufficient for most needs.

    Unlike Xilinx's demonstration AXI4 slave core, this one can handle 100% loading on both read and write channels simultaneously. That is, it can handle one read and one write beat per channel per clock with no stalls between bursts if the environment will allow it.

    This core has been formally verified.

  • AXILSAFETY is a bus fault isolator AXI-lite translator, sometimes called a firewall, designed to support a connection to a trusted AXI-lite master, and an untrusted AXI-lite slave. Should the slave attempt to return an illegal response, or perhaps a response beyond the user parameterized timeouts, then the untrusted slave will be "disconnected" from the bus, and a bus error will be returned for both the errant transaction and any following.

    AXILSAFETY also has a mode where, once a fault has been detected, the slave is reset and allowed to return to the bus infrastructure until its next fault.

    This core has been formally verified.

  • AXISAFETY is a bus fault isolator/firewall very similar to the AXILSAFETY bus fault isolator above with many of the same options.

    This core has been formally verified.

    AXISAFETY also has a mode where, once a fault has been detected, the slave is reset and allowed to return to the bus infrastructure until its next fault.

    This core has been formally verified.

  • AXI2AXILITE converts incoming AXI4 (full) requests for an AXI-lite slave. This conversion is fully pipelined, and capable of sending back to back AXI-lite requests on both channels.

    This core has been formally verified.

  • AXIS2MM converts an incoming stream signal into outgoinng AXI (full) requests. Supports bursting and aborted transactions. Also supports writes to a constant address, and continuous writes to concurrent addresses. This core depends upon all stream addresses being aligned.

    This core has been formally verified.

  • AXIMM2S reads from a given address, and writes it to a FIFO buffer and then to an eventual AXI stream. Read requests are not issued unless room already exists in the FIFO, yet for a sufficiently fast stream the read requests may maintain 100% bus utilization--but only if the rest of the bus does as well. Supports continuous, fixed address or incrementing, and aborted transactions.

    Both this core and the one above it depend upon all stream addresses being aligned.

    This core has been formally verified.

  • AXIDMA is a hardware assisted memory copy. Given a source address, read address, and length, this core reads from the source address into a FIFO, and then writes the data from the FIFO to memory. As an optimization, memory address requests are not made unless the core is able to transfer at a 100% throughput rate.

    This particular version can only handle bus aligned transfers. A separate version that can handle unaligned transfers is available for purchase.

    This core still has problems, particularly with misaligned addresses and lengths.

  • AXISINGLE is a (to be written) core that will also be an AutoFPGA companion core. Slave's of type "SINGLE" (one register, one clock to generate a response) can be ganged together using it. This core will then essentially turn an AXI core into an AXI-lite core, with the same interface as AXILSINGLE above. When implemented, it will look very similar to the AXIDOUBLE core mentioned below.

  • AXIDOUBLE is the second AXI4 (full) companion to AutoFPGA's AXI4 (full) support. It's purpose is to simplify connectivity logic when supporting multiple AXI4 (full) slaves. This core takes a generic AXI4 (full) interface, and simplifies the interface so that peripherals can be connected to it with a minimal amount of logic. These peripherals cores can either be full AXI4 (full) cores in their own respect, subject to simplification rules discussed within, simplified AXI-lite slave as one might use with AXILDOUBLE, or even simpler than that. Key to this simplification is the assumption that the simplified slaves must never stall the bus, and that they must always return responses within one clock cycle. The AXIDOUBLE core handles all backpressure issues, ID logic, burst logic, address selection, invalid address return and exclusive access logic.

    This core has been formally verified.

  • AXIXCLK can be used to cross clock domains in an AXI context. As implemented, it is little more than a set of asynchronous FIFOs applied to each of the AXI channels. The asynchronous FIFOs have been formally verified,

  • WBSAFETY is a bus fault isolator/firewall, very similar to the AXILSAFETY firewall above, only for the Wishbone bus. Unlike many vendor firewall implementations, this one is able to reset the downstream core following any error without impacting it's ability to respond to the bus in a protocol compliant fashion.

Licensing

This repository is licensed under the Apache 2 license.

Thanks

I'd like to thank @wallento for his initial work on a Wishbone to AXI converter, and his encouragement to improve upon it. While this isn't a fork of his work, the initial pipelined wishbone to AXI bridge which formed the core seed for this project took its initial motivation from his work.

Many of the rest of these projects have been motivated by the desire to learn and develop my formal verification skills. For that, I would thank the staff of Symbiotic EDA for their tools and their encouragement.