Timestamp featue

Question

Timestamp featue

acostina opened this issue 2 years ago · 6 comments

This is the HDL side of the discussion here: analogdevicesinc/libiio#860

One 64-bit counter in axi-dmac, running on the sample clock. It would need to be readable from a register too.
A register area of NB_BLOCKS_MAX * 64-bit, containing timestamp values for each one of the blocks, indexed by the DMA hardware descriptor index.
During a TX or RX transfer, the IP would read the corresponding timestamp from the registers area, wait until its value matches the internal counter, then perform the transfer. If the free running counter is already past the timestamp value, this is an underrun scenario, and the IP can raise an IRQ to the processor. In this case, the current counter's value is written back to the timestamp register (If there's no underrun the write wouldn't change the content of the register anyway).
The HDL would have to crawl the DMA descriptors list, in order to find the next transfer, because the transfers could be submitted out-of-order. Unless we make it a requirement that the transfers must be in-order.

There is also the possibility instead extending AXI_DMAC to support all the new features, just to create an input with timestamp which can be read at the same time with the sync start and create a separate IP responsible with all the timestamping feature to reduce the complexity of maintaining the AXI_DMAC IP.

In my opinion, the above discussion doesn't cover cases when the number of samples per beat is higher than 1 (typical case with sampling rates over 200-350MSPS). In some cases, when the bandwidth of the DDR doesn't keep up with the data bandwidth from the converters we don't even have the sampling clock as an input of the DMA. Extending the data offload IP may be needed for these use cases.

On the TX side, we could extend the AXI_TDD (#975) to trigger transfer at a specific timestamp.

The PACK IPs would also need to be improved for clean implementation of this feature, right now when we want to do synchronized transfers between FPGAs and also as part of the phaser example [https://github.com/analogdevicesinc/plutosdr-fw/tree/pluto_phaser] we keep them in reset before the transfer which is not ideal.

Answer 1 · 2022-10-05T08:53:17.000Z

Hi, some comments from my side

I would propagate the timestamp together with the rest of the dma descriptor inside the axi_dmac core. That keeps code changes minimal, especially since the dma descriptor is pipelined implicitely at different places. The crawling feature is not needed I think, because other existing SDRs like UHD also dont support it (I just looked at the UHD firmware quickly - might have missed something - but it seems the command queue is blocked until the timestamp for the top command is reached). In this case the additional complexity inside axi_dmac would be manageable I think.
Another thing one would have to consider is synchronizing timestamp counter inside the different axi_dmac cores. Especially if there is the possibility to have different sample rates on the same device. Currently I only have a PlutoSDR, which can only run the same sample rate for TX and RX. In that case one could just reset the timestamp counters inside the axi_dmac with the reset line. (set and getHardwareTime, which some applications want, could be realized on a higher level inside the driver).
The response of axi_dmac would also have to be extended with an additional flag, so that it can signal if a timestamp was already passed and a transfer could not be executed.

There is an issue with cpack when multiple samples are packed inside one beat (as you mentioned above). I have done some experiments in my private repo, where I extended cpack und upack to have two additional inputs.

burst_end: goes high when the last beat of a transfer is read/written
valid_bits: outputs the number of valid bits during the last beat
the burst_end and axi_dmac signals are generated by the axi_dmac core. Additional complexity for that is quite low I would say (see code in my private repo). Most additional complexity is inside the cpack/upack cores. Its a bit tricky, because the cores should not loose a sample sample, when a partial beat happens, ie
there are 2 samples pear beat, user wants to read 3 samples. In this case the 4th sample, which is inside the last beat should not be discarded, but be the first sample of the next beat.
I have implemented the partial_burst feature in the cpack/upack and axi_dmac cores. I have also created testbenches that test the cpack/upack cores together with the axi_dmac core for the critical cases. The dma descriptor has a new field (new register) for the number of valid_bytes for each transfer. The transfer length is still the old one that respects the memory alignment requirements. transfer_length could probably be eliminated from the dma descriptor and deduced from valid_bytes by just rounding up to the next valid transfer_length.

I (temporarily) removed the abort feature from the axi_dmac so that I could implement the new stuff. The new stuff could be merged back into axi_dmac with abort feature. The main obstacle for me was that there are no testbenches for the abort feature while it adds a lot of complexity at the same time.

I did not look at the new AXI_TDD or high speed JESD204 stuff yet. Will do that soon.

Answer 2 · 2022-10-05T19:50:23.000Z

I looked at the AD9361 and ADRV9002 data sheet. Both ICs only have 1 base band PLL. In that case the clock for all axi_dmac cores should have the same frequency, so they can be synchronized with a common reset signal. I dont know if there are any 2RX-2TX ICs that allow different sample rates at the same time. It was not so clear in the ADRV9008, ADRV9026, AD9988 data sheets (but I only looked briefly).

Answer 3 · 2022-10-06T00:31:48.000Z

Then you need to look closer. ADRV9002 can have unique (ish) rates for DACs and ADCs. Datasheet says:

Each DAC has an adjustable sample rate and is linear up to full scale.
The sample rate of each digital filter block automatically adjusts with each change of the decimation factors to produce the desired output data rate.

While they have the same baseband PLL - the dividers can be unique.

The same is true on the AD9081 - most times the ADC/DAC samples rates do not match.

However - when they don't match - they all have their own sample clocks, and typically - their own DMAs... Timestamping is sort of the only want to keep things sync'ed.

Answer 4 · 2022-10-06T09:07:50.000Z

In that case it might be necessary to have a sync mechanism for the different timestamp counters inside the tx and rx axi_dmac.

If after reset, the baseband PLL dividers are set up not at the same time, the rx and tx timestamp counter will have different values even when the final sample rate for tx and rx is the same.
The rx/tx timestamp counters would have identical values, if the clk_out of the AD9361 (or any other SDR IC) could be disabled from reset until all PLL dividers are set. Is that possible?
If not, one could just add an extra reset_timestamp output to the AD9361 hdl core that can be controlled via register write. The axi_dmac cores would have an additional reset_counter input.

For my purpose (digital communication) it is not a problem if the rx and tx counters have different values, because I only need relative timing.
For MIMO and beamforming it is important that all counters are synced, also across multiple FPGAs. The axi_dmac core probably needs a reset timestamp input for this feature.

Anyway I think it will be pretty easy to synchronize the timestamp counters, so not much to worry about now.

Answer 5 · 2022-10-06T09:25:49.000Z

This show outputs from my testbenches for cpack-axi_dmac combinations.
The configuration is: 2 samples per channel, 2 channel, 16 bit per sample, only the first channel is enabled
This screenshot shows an overview of a TX.

transfer: 2 samples (1 for each channel) -> req_valid_bytes = 3 (which means 4)
transfer: 8 samples -> req_valid_bytes = 15 (means 16)
transfer: 10 samples -> req_valid_bytes = 19 (means 20)
The pattern I set in this testbench is that every 2nd transfer has a burst at the end, where not all samples are valid (I call it partial burst).

zoomed in on the 1st transfer:

Samples for the enabled channel are highlighted yellow. Here it is visible, that the first transfer only generates 1 dac write as it should, because each channel has only 1 valid sample.

zoomed in on the 3rd transfer:

Answer 6 · 2022-10-06T18:16:05.000Z

@acostina I would also put the timestamp trigger logic for TX inside the axi_dmac, because the axi_dmac should give some status about the transfer in the dma response. It would be difficult to pass for example "time passed" from a tdd core back to the axi_dmac core. Moreover, the timestamp fits architecturally well together with the rest of the dma descriptor data.
I mean on the linux driver side, the timestamp would arrive together with the rest of the dma descriptor in the axi_dmac driver. Therefore it would be very natural if the axi_dmac also receives this data.