ipbus/ipbus-firmware

IPBus read-out

aranzaba opened this issue · 3 comments

Dear IPBus developers,

I am working in a project to read a photon counting detector at the European Synchrotron (ESRF) and I recently integrated IPBus in Enclustra XU8 SOM (Zynq UltraScale+, XCZU4CG).

Now I am working in the implementation of a mechanism to read-out the data from the detector. One of the possibilities that I have in ming is to use a dual-port ram (ported) slave. The idea would be to first fill this ram with data from the detector and have a flag to detect when it is full, and then read the full ram using the readBlock uHAL function. After reading the ram, again assert another flag and repeat the same process until all data is read (8MB). Or maybe something similar with two ping-pong rams.

Maybe a nicer option could be to use a FIFO, but it is no so clear to me how to read it using uHAL. From uHAL tutorial, I see that is possible to use the mask attribute for FIFOs & ports (mode == non-incrementing) and that it could be possible to read memory blocks from a FIFO. However, what is not clear to me is how the read_enable is asserted during a memory block transfer. Does the readBlock function take care of toggle the read_enable port? And if this is the case, I guess that we should define the address table with a specific format (e.g. nodes for fifo_empty, fifo_full, read_enable, write_enable). Is there any example of this? Also, is there any other IPBus FIFO slave apart from the big_fifo_36.vhd and big_fifo_72.vhd?

This are my first thoughts, but I would deeply appreciate your feedback concerning any suggestion to improve the read-out performance.

Many thanks.

Hi,

The uHAL readBlock function will simply issue a set of block read transactions whose total size matches the number of words requested, with each single readBlock call split up into multiple read transactions so that individual transactions are less than the maximum transaction length in the spec, and also fit within the size of packets. These reads will be either address incrementing or non-incrementing, depending on the mode specified as you mentioned. The IPbus transactor converts these transactions into the IPbus bus signals described here.

It is the logic in the IPbus slave itself that would then connect those IPbus bus signals to the ports of the BRAM/FIFO entities. The big_fifo entities that you mentioned aren't quite IPbus slaves (i.e. those entities don't have IPbus bus I/O ports), instead they just wrap the Xilinx 7-series FIFO primitives (which we've found useful in some projects in the past). We don't have any examples of IPbus slave entities that expose fifo_empty, fifo_full, read_enable & write_enable. The IPbus slaves typically used in our other project-specific designs for reading out data are the block memory slaves - listed here - which come in two variants: The 'ported' slaves occupy two addresses - one for the address, the other for data - with the data address exposed incrementing each time that data is read. The other IPbus RAM slaves simply expose the full RAM directly onto the address space (i.e. occupy N addresses for a block memory of (32-bit-data) depth N).

In terms of simplicity and performance, the ping-pong RAM setup that you described could be reasonable at least for an initial implementation. But the optimal solution will of course depend on your exact requirements, of which I'm unaware - e.g. most notably whether you need to use the vast majority of the bandwidth of your (1Gbps?) network links, or your readout bandwidth is much lower that that.

Hi Tom,

Thanks a lot for your response.

We have two read-out paths, slow and fast. For the slow data path the bandwidth is of ~100Mbps and we are planning to use 1Gbps link, so it looks like that a ping-pong RAM or a FIFO should be enough to cover our requirements. However, for the fast read-out the bandwidth is of ~6.2Gbps and we would need to enable a 10 Gbps link. Do you have in mind a more optimal solution in this case?

For multi-gigabit readout I don't have an ideal solution to suggest that will work in all scenarios - since the throughput will depend on various factors like latency that can vary from one setup to the next. My only suggestion would be to try a similar simple ping-pong RAM approach, and test it in practice in your setup, identify any bottlenecks and start iterating from there.