RP2350 can't capture OV5640 data at full speed
Opened this issue · 8 comments
I've had issues a clean image capture from the OV5640 with the RP2350. With the current default configuration, the images capture by the RP2350 look like this:
So, I wrote the following debug code. In short, it makes the camera output a single frame of the built-in test pattern, it manually controls the DMA and PIO state machine to ensure they're configured correctly before each frame, and ensures nothing else is happening while the image is being captured.
import time
# Force camera into a test mode
camera.writeRegister(0x503D, 1 << 7) # Test pattern
camera.writeRegister(0x3008, 0x42) # Software power down
camera.release() # Disable VSYNC handler
camera.sm.active(True) # Keep PIO state machine active
camera.dma.config(count = 320*240*2//4) # Ensure DMA count is correct
# Makeshift VSYNC handler, run between frames
camera.dma.active(False) # Stop DMA so it can be reconfigured
camera.sm.restart() # Clear the PIO state machine ISR
camera.buffer[:] = 0 # Wipe the camera buffer
camera.dma.write = camera.buffer # Reset the DMA write address
camera.dma.active(True) # Re-activate DMA to be ready for next frame
# Capture 1 frame and display it
camera.writeRegister(0x3008, 0x02) # Enable camera
time.sleep(0.026) # Wait for 1 frame (may need to adjust)
camera.writeRegister(0x3008, 0x42) # Disable camera
print("dma.count:", camera.dma.count) # Print number of bytes remaining
display._write(None, camera.buffer) # Display captured image
It prints dma.count: 24254 , and the following image is captured:
If I drop the XCLK frequency from 25MHz to 7.5MHz (camera.xclk.freq(7_500_000)) and sleep for 87ms (long enough for 1 frame at these settings), it prints dma.count: 3590 and the following image is captured:
The first ~5 rows look fine, but there's clearly multiple desyncs happening during the frame. If I drop the XCLK frequency to 5MHz (camera.xclk.freq(5_000_000)) and sleep for 131ms (long enough for 1 frame at these settings), it prints dma.count: 0 and the test image is captured correctly:
So, the problem appears to be that the camera is just outputting data too fast for the RP2350 to handle. But this is odd to me, because the HM01B0 outputs 79056 bytes in just under 30ms, or about 2.7MBps (with default settings), and the RP2350 handles it just fine. Whereas running the OV5640 at 7.5MHz XCLK, it outputs 153600 bytes in just over 80ms, or about 1.9MBps, and it misses bytes. Something doesn't add up here, why does the slower data transfer miss bytes?
I connected an oscilloscope to the DVP pins. Signals from top to bottom are VSYNC, HSYNC, PCLK, D0. This is at 25MHz XCLK with the test pattern image:
My probing setup isn't super clean, so the signals are a bit noisy.
Zooming in, we can see the HSYNC pin has a relatively low duty cycle. This means each row is being output in a short burst, followed by a relatively long period of nothing.
Looking closer at adjacent rows, they're all clearly outputting the same signal, which is expected for the test pattern.
So electrically, the signals seem fine. Thus, I don't think there's any problem with the camera itself, it's entirely just the RP2350's ability to capture the data stream.
For comparison, I connected an HM01B0 and made it output its color bar test pattern (camera.writeRegister(0x0601, 0x01)). The trace of the whole frame looks similar:
Zooming in, the HSYNC duty cycle is clearly much higher, most of the time is spent transmitting data:
Because of the Bayer data, every other row repeats:
--
So the biggest difference I see is that the OV5640 transmits each row in very rapid bursts, whereas the HM01B0 transmits at a much slower and consistent rate. Each row of the OV5640 is 640 bytes and transmitted in only ~16us (~53us at 7.5MHz XCLK, and ~80us at 5MHz XCLK), whereas the HM01B0 transmits 324 bytes per row in ~104us.
I wonder if the short bursts of data are just too fast for the PSRAM to handle. But the weird thing is that at 7.5MHz XCLK, the first few rows come in just fine. So that makes me wonder if the problem is actually with the XIP cache; perhaps it buffers the first few rows (5 rows is ~3kB, XIP cache is 16kB), then pushes to the PSRAM, and during that time misses the data from the DMA? For some reason? I can only speculate.
I think it may be possible to make the OV5640 data stream less "bursty" by fiddling with some registers. Page 4-3 of the OV5640 datasheet documents the DVP timing, which comes from how the image window configuration is set.
With the default configuration, I polled registers 0x3800 through 0x3813 (camera.readRegister(0x3800, 20)):
| Address | Name | Value |
|---|---|---|
| 0x3800-0x3801 | X_ADDR_ST | 0 |
| 0x3802-0x3803 | Y_ADDR_ST | 0 |
| 0x3804-0x3805 | X_ADDR_END | 2623 |
| 0x3806-0x3807 | Y_ADDR_END | 1951 |
| 0x3808-0x3809 | X_OUTPUT_SIZE | 320 |
| 0x380A-0x380B | Y_OUTPUT_SIZE | 240 |
| 0x380C-0x380D | HTS | 2060 |
| 0x380E-0x380F | VTS | 984 |
| 0x3810-0x3811 | X_OFFSET | 16 |
| 0x3812-0x3813 | Y_OFFSET | 8 |
HTS and VTS are the "Total horizontal/vertical size" values, which I can't seem to find much documentation about what they are in the datasheet, nor what are acceptable values. But with my oscilloscope, I count 4020 PCLK cycles between each HSYNC pulse, which corresponds perfectly with 2060 pixels per row (RGB565, so 2 bytes per pixel),
So presumably, I can reduce HTS to shorten the pauses between HSYNC pulses, right? I did the following:
def setHTS(value):
data = bytes([(value >> 8) & 0xFF, value & 0xFF])
camera.writeRegister(0x380C, data)
setHTS(1500)
At 2060, there's about 105us between each HSYNC pulse (at 25MHz XCLK):
At 1500, there's about 75us between each HSYNC pulse, which does actually increase the frame rate from 39.5 FPS to 54.2 FPS:
So that's promising! However if I drop down to 1300 for example, it stops outputting data completely:
I'm unable to find anything conclusive about the HTS/VTS values, what exactly they are, what they do, what values are acceptable, etc. According to this post, most OV5640 drivers just experimentally determine acceptable values. OpenMV's driver calls this out, claiming that too small of a value causes the OV5640 to "crash", possibly implying it needs that extra time for... something...
So, it sounds like we may be able to slightly improve the "burstiness" of the OV5640's data stream, but not the ~5x that we'd actually need. So may need to instead see if there's a way to improve the PSRAM transfers, assuming that's actually problematic and can be improved.
I set up a test in Arduino. If the buffer is in SRAM, it works fine at full speed. If the buffer is in PSRAM, it misses pixels unless I drop the pixel clock speed. The time to send pixels to the display also drops from ~35FPS to ~15FPS, implying it's waiting for the data from PSRAM. So it's almost certainly an issue with the PSRAM not being able to transfer fast enough.
Will take a look at the XIP stream example to see if it's possible to improve this.
Looks like the XIP stream is only for streaming from flash (and probably PSRAM too) to DMA, doesn't seem like it can go the other way.
The other option I've found is the SSI flash example (SSI on RP2040 got replaced with QMI on RP23500), which looks like it sets up direct access to the QSPI bus with support for RX and TX. The example only demonstrates RX, but should be relatively simple to change it for TX. The downside is that it can't be done while code is executing from flash, and I've got no idea what consequences that would have in MicroPython.
To determine what transfer rates the DMA is capable of with the PSRAM, I wrote the following test code. This transfers data from the PIO state machine to PSRAM as fast as possible (not waiting for the PIO DREQ signal).
import time
# Create a DMA controller
dma = rp2.DMA()
# Set number of bytes per transfer, can be 1, 2, or 4
bytes_per_transfer = 4
# Configure DMA controller
dma_ctrl = dma.pack_ctrl(
size = {1:0, 2:1, 4:2}[bytes_per_transfer], # 0 = 8-bit, 1 = 16-bit, 2 = 32-bit
inc_write = True,
inc_read = True
)
dma.config(
read = camera.sm,
write = camera.buffer,
count = 320*240*2 // bytes_per_transfer,
ctrl = dma_ctrl
)
# Start timer, start DMA, wait for completion, stop time, print elapsed time
t0 = time.ticks_us()
dma.active(True)
while dma.active():
pass
t1 = time.ticks_us()
print(t1-t0, "us elapsed for", bytes_per_transfer, "bytes per transfer")
Test results:
17174 us elapsed for 1 bytes per transfer
16224 us elapsed for 2 bytes per transfer
15694 us elapsed for 4 bytes per transfer
So regardless of how many bytes per transfer, it should totally be able to transfer the whole frame to PSRAM at 25MHz XCLK (~24ms frame time). Seems like something is causing the PSRAM to be artificially throttled?
If I drop the XCLK frequency to 5MHz... the test image is captured correctly
Note that this only works for that specific test procedure. If I run example 3, I still get funky images (even after 81f7cb1, which drops the XCLK frequency to 5MHz), because other things are using the PSRAM at the same time. If I drop the XCLK even further (eg. 2.5MHz), it's better, but still missing bytes. XCLK can only be dropped so much before the OV5640 stops working entirely (6MHz minimum, according to the datasheet).
I think it might be feasible to fix this by changing dvp_rp2_pio.py to buffer each row of pixels in SRAM. Have one DMA transfer from PIO to the SRAM buffer paced by DREQ_PIOn_RXm, then a second DMA transfer from the SRAM buffer to PSRAM. These transfers would need to be synchronized, and the read/write addresses reset back to the start of the SRAM buffer after each row. Possibly with interrupts, or with another DMA if interrupts prove to be infeasible for some reason.
For the second DMA, it might be necessary to use the QMI direct mode paced by DREQ_XIP_QMITX. Section 4.4.3 of the RP2350 datasheet states "QMI serial transfers force lengthy bus stalls on the DMA" and "stalling the DMA prevents any other active DMA channels from making progress during this time". Sounds like if the second DMA were to use the normal XIP memory mapped interface, it will cause stalls on the first DMA channel too. The QMI direct mode should prevent that problem.
I do have a couple concerns with the QMI direct mode. First is that if the processor attempts to read from flash (eg. to execute code) while direct mode is in use, a bus error can occur, causing the CPU to freeze. Second is that the qmi_hw->direct_csr needs to be configured correctly for each transfer, which could not be done from an interrupt because of the first concern. I think these can be resolved by simply using another DMA instead of interrupts (actually, 2 DMAs with a control block sequence).
If QMI direct mode is needed, will probably be helpful to reference the PSRAM datasheet during development. I believe the 8-bit write command (0x02 or 0x38) will need to be issued, followed by the 24-bit address, followed by the data.
Also, if QMI direct mode is used, then probably need to change the camera driver to only actually read from the camera when read() is called to ensure the processor doesn't try to access flash. Although this will make read() take longer, other processes will probably run faster since the camera driver won't be always slamming the QSPI bus, which is almost certainly a bottleneck.












