Hardware Considerations for FPGA/ESP32-P4

Question

Hardware Considerations for FPGA/ESP32-P4

Opened this issue 9 months ago · 1 comments

Saw your recent comment on FuryGPU at hackernews; figured I'd pop in and let you know that the ESP32-P4 is on the way soon (should start hearing a bunch of buzz about it come june or july) with a PPA (Pixel Processing Accelerator) and better support for high resolution outputs. The flipside is, it will lack a radio, so will need a companion from the ESP32 family running another firmware package like ESP-Hosted to do radio-work. But a GPU shouldn't need that.

Initial documentation's already partially available here:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32p4/index.html

At the moment, a preliminary datasheet (v0.1) is available from here:
https://www.erlendervik.no/ESP32-C5%20Beta_ESP32-P4_ESP8686_ESP32-C3FH4X/ESP32_P4_Chip_Datasheet_V0.1_PRELIMINARY_EN.pdf

Here's the highlights I've pulled from it:
104 pin package, with two memory options: 16MiB PSRAM or 32MiB PSRAM
~768K of 200Mhz on-die SRAM + 32K of 40Mhz SRAM for LP
8K of 400Mhz on-die TCM memory
4KiB of eFUSE with 1792 bits reserved for user data "such as encryption key and device ID"

External flash or PSRAM can be mapped into 64MB of CPU instruction space in blocks of 64KB
External PSRAM can be mapped into 64MB of CPU data space as blocks of 64KB (separate ranges)
It has 16 PMPs so it may be able to run a minimal linux kernel or some of the more advanced RTOSs (but probably won't because of Espressif's love of FreeRTOS in ESP-IDF)

It has a dedicated audio PLL for the I2S, up to 125Mhz. This can be abused for many other tasks, including filtering off higher harmonics to transmit on the upper bands. (See CNLohr's recent explorations)

Same ol' crappy nonlinear espressif audio-quality ADCs, good enough to capture up to 14 near and far field mic channels, battery voltage divisors and other tasks, but if you need good analog capture characteristics, like for an oscilloscope, look elsewhere.

55 GPIO pins via pin multiplexing, with the Low power core getting up to 16. Two high speed SPI busses you can use.
Two more internally to deal with the flash and PSRAM channels, which should be ignored as GPIOs.
Five of them can be used as UARTs, with one being exclusive the the low power core.

I3C, I2C, and I2S interfaces, and the thing of most interest: two USB 2.0 "High-Speed" interfaces. One of them supports general OTG, while the other is limited to USB serial/JTAG/Boot. These are GPIO24-27.

Ethernet has what appears to be three sets of GPIOs it can be set to (28-36, 40-48, and 49-54) to dodge use of other pins.

GPIOs 35-42 can be assigned to a MIPI 2 lane DSI display interface, which is going to be very interesting for higher resolution panels, or for going directly out to a DSI-to-DisplayPort bridge chip. Very possible this little dude might be able to raster out 4K at terrible framerates, but will definitely be able to handle 1080p.

One of the more interesting peripherals (which we may, or may not get reasonable access to) will be the h.264 encoder, which says it has a maximum encoding performance of 1920x1088@24FPS. Pretty close to something I'd expect out of a 1080p security camera. And indeed, it can deal with MIPI 2 lane CSI on GPIOs 42-49.

So, a 1080P display on GPIOs 35-42, a 1080P camera on 42-49, and ethernet on the few remainders, with enough room for USB2.0, SD card, some captouch buttons, encoders, or charlieplexed LEDs.

Throw a DSI to displayport chip on, skip the HDMI license fees, and this appears to be quite a nice little HMI controller.
Considering that ESP-Hosted can be used to throw 40Mbit of radio bandwidth at it, as well as alleviate the ethernet GPIOs away from the whole TCP/IP stack, it's a pretty reasonable security camera design too. Plus some genius will remember that the PiKVM has a HDMI to CSI-2 bridge chip swapping the camera capture for monitoring a computer display. Taking bets on how fast we'll see P4KVMs show up?

Anyway, since you expressed interest in looking into the FPGA space in that posting, I figured I'd point you at the ICE40 lineup of low cost FPGAs. There's a new flashcart out for the nintendo switch called a MiGSwitch which pairs a ESP32-S2 or S3 (can't recall) with the ICE40 to act as high speed glue logic.

However, I don't really think that it would really suit the S2/S3 as a good companion doing DSP-like work like fast mul or div repetitions, but the PicoDVI with an overclocked RP2040 has been the lowest cost solution I've seen so far to generating 720p output that cheap modern displays can deal with without going unhinged, losing color, or showing out of range errors.

Good luck with the project, and if I get a chance I'll play with microgpu myself.

Answer 1 · 2024-04-01T13:35:15.000Z

Wow thanks for the info! The P4 sounds incredibly useful for my GPU purposes indeed. I see that the PSRAM is octal SPI like the S3 but I wonder what the actual SPI speed of that lane will be. I know it's 80Mhz on the S3 and on a 800x480 display that I have it's definitely a bottleneck I have right now for frame buffer work. I've almost thought about seeing how well splitting the frame buffer up into tiles would help with cache prefetching (though I haven't done anything to vet the viability of that yet).

So to manage a 1080p frame buffer we'd need faster than 80Mhz SPI RAM I think? If my math is right (and it could not be, I just woke up) it would take ~52ms to retrieve a 1080p RGB565 frame buffer from PSRAM at 80Mhz (and in practice it would probably act like the S3 and only do the DMA transfer at 40Mhz to allow for some bandwidth for user code).

I'm pretty impressed with ESP's APIs thus far though so I am assuming they have some tricks up their sleeves to manage it. I recently found asynchronous memory transfer capabilities they have, which can potentially speed up some of my sprite drawing code.

Anyway, since you expressed interest in looking into the FPGA space in that posting, I figured I'd point you at the ICE40 lineup of low cost FPGAs. There's a new flashcart out for the nintendo switch called a MiGSwitch which pairs a ESP32-S2 or S3 (can't recall) with the ICE40 to act as high speed glue logic.

I've had a notch in the back of my mind that my FPGA goals would benefit from an CPU + FPGA duo instead of trying to jam everything on the FPGA. I currently have possession of a Nandland Go board (which I think is ICE40), a DE0-NANO and DE10-Lite (the latter two were donated by colleagues who bought them then realized they had no use case).

I've installed tools and started learning verilog, and spent some time with Nandgame to get familiar with digital logic. Unfortunately, my time has been a bit consumed recently with these esp32-s3 all-in-one boards. In theory these make amazing Microgpu boards because they are larger displays than SPI can feasibly use, and no custom pcbs required. Just flash the firmware and connect your Logic MCU to it and go.

Unfortunately, I keep getting corrupted SPI data into it even though SPI works perfectly well into my custom PCB and the dev board. My lack of expertise in electronics and lack of free time due to kids has made it hard to make progress. I"m going to give UART a shot though, since in theory the data going between the logic MCU and the MicroGPU is in the hundreds of bytes range, so the slower speed shouldn't be an issue assuming I can get it to be reliable.

However, I don't really think that it would really suit the S2/S3 as a good companion doing DSP-like work like fast mul or div repetitions, but the PicoDVI with an overclocked RP2040 has been the lowest cost solution I've seen so far to generating 720p output that cheap modern displays can deal with without going unhinged, losing color, or showing out of range errors.

Haha, I have 2 Pi Picos, one of them is the one with the DVI connector built into the PCB. I started this project in that direction (and is one of the reasons I tried to make the core Microgpu firmware agnostic to support adding a Pico firmware in the future. Unfortunately, the 265K of memory on the PICO makes it hard to do much, especially when talking about a display that needs to be constantly fed (like a normal RGB LCD display).

I have wondered if there's a more efficient way for me to store drawing commands so I can work on demand without a frame buffer more easily, but I haven't had time to thoroughly think that through.

Anyways, appreciate you stopping by!