Recommended approach for DMA peripheral to/from memory?
Closed this issue · 3 comments
Hi, I've been doing some thinking about the best approaches for using DMA to transfer data to and from peripheral registers, and wanted to see if my thoughts are the correct approach or if others have other tricks to recommend.
So a lot of microcontrollers have methods to configure peripherals to synchronize with the DMA controller so that, for example, the DMA doesn't overrrun a TX fifo if full or read garbage if no data is yet available in a RX fifo, allowing for true fire and forget transfers without CPU intervention.
Looking at the datasheet, it does not seem that neorv32 supports this kind of synchronization (which is understandable given that would add additional complexity). So with that constraint, this is how I imagine the best approach for using DMA to look like, using SPI as an example:
- If I have N bytes of data I want to transfer, break it into chunks of size SPI FIFO depth
- For each chunk:
- Set up DMA to transfer the chunk to TX FIFO
- Wait for transfer to complete and SPI to idle via interrupt
- Set up DMA to transfer from the RX fifo to my read buffer
Of course, this approach relies on the SPI FIFO being sufficiently large enough to make this worthwhile. If it's only configured to be a size of 1, for example, we would see no benefits from DMA. Does this seem like correct reasoning?
Additionally, the above approach might only work for synchronous protocols. With UART, for example, if we are expecting to receive, say, 100 bytes but the UART RX FIFO is greater than that, there does not appear to be a setting for UART to trigger an interrupt after N bytes received (only if RX FIFO is full or not empty), so we don't appear to have a good way to know when to trigger the DMA transfer from RX to a buffer.
Is there some other approach that could be used here, or is the best approach just to forego DMA and simply read a byte every time UART RX fifo not empty irq gets triggered?
Appreciate any thoughts, thank you!
Hey @kurtjd!
Please excuse my late reply.
Of course, this approach relies on the SPI FIFO being sufficiently large enough to make this worthwhile. If it's only configured to be a size of 1, for example, we would see no benefits from DMA. Does this seem like correct reasoning?
If you only want to (or can) move a single byte from the SPI somewhere, then it makes no sense to use the DMA. Setting up the DMA transfer would take longer than copying directly via the CPU.
Additionally, the above approach might only work for synchronous protocols. With UART, for example, if we are expecting to receive, say, 100 bytes but the UART RX FIFO is greater than that, there does not appear to be a setting for UART to trigger an interrupt after N bytes received (only if RX FIFO is full or not empty), so we don't appear to have a good way to know when to trigger the DMA transfer from RX to a buffer.
That's right. I considered implementing some kind of programmable level for the IO FIFOs that would trigger an interrupt if exceeded. That would actually be cool, but it would probably cost quite a bit in terms of hardware... But I think I need to take another look at it.
Is there some other approach that could be used here, or is the best approach just to forego DMA and simply read a byte every time UART RX fifo not empty irq gets triggered?
Unfortunately, I can't think of a good idea either. That would actually be a good reason to implement FIFO level interrupts...
The DMA controller is kept very simple. You could add individual channels that are triggered by specific IO interrupts (e.g., “UART RX FIFO level exceeded”), but then the complexity would eventually explode. I mean, the DMA already takes up almost 30% of the size of a second CPU core... If you add even more features, you could eventually just use the second core (in minimal ISA configuration) for IO tasks. 😅
Thanks for the response, I really appreciate it! And yep I totally get it that addressing some of these concerns would add additional complexity that is outside the scope for this project, so definitely not expecting that. I just wanted to see if there was anything else I hadn't considered.
The main motivation for asking is that often async embedded HALs in Rust utilize DMA heavily, and the neorv32 DMA implementation is definitely suitable, but I just might have to get a little creative around some of the limitations :)
But yeah thanks again for clarifying!
In fact, I hardly ever use the DMA controller myself - I always use the second CPU core for interrupt management and data transfers.
[...] but I just might have to get a little creative around some of the limitations :)
I am open to suggestions on how to improve the DMA or make it more user-friendly.