mrcodetastic/ESP32-HUB75-MatrixPanel-DMA

Q: Slow flipDMABuffer (is there an alternative?). A: Yes, use: https://github.com/mrcodetastic/GFX_Lite

russdx opened this issue · 10 comments

The flipDMABuffer works very well (as in keeping the display clean) but comes at a cost of really dropping the fps. Is this because of has to wait for the display to complete one full DMA output cycle and is blocking until this is complete?

In another issue thread there is mention of writing to the library matrix buffer directly in one hit to achieve a similar affect as the dual buffer. But how would we know when to complete this write? (is there a call-back or flag to say when a full DMA output cycle has completed? Is there any examples of writing to the internal buffer directly?

I basically am streaming an array of simple rgb bytes from SPI and just need to get them into the matrix buffer as fast as possible but need some form of double buffering or knowing when to write my buffer to the display buffer to avoid screen tear as my target framerate is about 60fps on a 192x32 display.

I also noticed the max I2S clock speed selector of the lib is 20mhz, I tried 30mhz and this also appears to work great and increased the fps(from 31 to 45, on a ESP32-S2 is this safe? or could it damaged the hardware running over the default 20mhz?

Any help would be much appreciated.

Regards
Russell

Have been playing with this all day and have managed to find a work around. I have dropped I2S back to 20mhz but set _min_refresh_rate to 120 (double the default 60) also removed the double buffering completely but added some logic to only set pixels that have actually changed each frame to reduce the amount of drawPixel calls I make. The display can now run my input frames at 90fps with zero tearing or text "jiggle" and is extremely smooth.

I have dropped to 16bit colour from 24bit though so was thinking I could drop PIXEL_COLOR_DEPTH_BITS_DEFAULT from 8 to 6, as this really speeds up the updateMatrixDMABuffer routine but appears to remove darker shades?

What do you mean by 'at a cost of really dropping fps'? This libs creates descriptors for each row and the discriptors are chained in a loop, well 2 loops one for each buffer. When you switch the buffer, the last discriptor of buffer A is changed, to point to the first discriptor of Buffer B. So the Esp does not wait and your drop in fps is, that you see the frame in the new buffer a little later. We talking about fractions of your fps (on average half of your mspf (milli seconds per frame).
Well if you try to play a gif or a video with lets say 60 fps. You might run into the following issues:
Framebuffer FPS below 60, images of you video are skipped.
Framebuffer FPS exactly 60, some Images might be displayed twice, others will be skipped. Because the 60 fps can not be synchronised.
Framebuffer FPS above 60, some Images are show multiple times.
Framebuffer FPS above 120, some Images are show 2 times, other 3 times. Or even higher numbers
I don't think one would notice that. Except you try to make a pixel perfect scroller. Like moving Text by 1 pixel every 5 frames. This is really hard to do perfect, I'm working on this myself. A Callback or flag would be helpfull, but I don't know about one yet. Insted I reduced the Color deph to increase the fps and minimze the problem.

I think if you set the I2S freq above 20mhz, the lib will use this Value and calculate the resulting FPS with this value, but the internal ESP driver is not capable to use any value(, so it chooses the closes one?). I am not sure about this part, maybe you can try to film it with your phone and try to veryfiy my or your thoughts. And when the esp is not overheating I don't think you can damage the esp so easily.

A 16 bit color is rrrrrggg gggbbbbb, thus you are missing out the lowest bit on red and blue. Next the lib calculates the 16bit rgb565 to rgb888 and then uses the highest 6 bits for the color deph. It you take a look at the formular, there also can be some unprecission in calculation, and for red and blue the lowes bit can not be set, resulting in 'missing' dark colors:

inline void MatrixPanel_I2S_DMA::color565to888(const uint16_t color, uint8_t &r, uint8_t &g, uint8_t &b)
{
r = (color >> 8) & 0xf8;
g = (color >> 3) & 0xfc;
b = (color << 3) & 0xf8;
r |= r >> 5;
g |= g >> 6;
b |= b >> 5;
}

     calculated      cut off

rgb565 -> rgb888 -> rgb666
1, 0, 1 -> 8, 4, 8 -> 2, 1, 2
2, 0, 2 -> 16, 8, 16 -> 4, 2, 4
3, 0, 3 -> 24, 12, 24 -> 6, 3, 6

Another point why dark colors are disappearing when you reduce the color bit depth is, that you are actually removing just the dark colors:
Let's just look at one color and one led. And let's assume one led is always lit for 1 unit (U) (we are talking about mico seconds).
In 8 bit, there are 255U for one color cycle. If the highest bit of the color is set, the first 128U the led is glowing. If the second bit is not set, the led is off for 64U, or on if it would be set. And so on. So the 2nd lowest bit would turn the led on for 2U and the lowest for 1U.
In 6 bit, there are 63U in total. The hightest bit controlles 32 and the lowest 1U.
In 6 bit the whole process is repeated around 4 times, while it is just repeated once in 8bit.
So the lowest colors in 8 bit, would light up for 1U, 2U, 3U and 4U -> 1/255, 2/255, 3/255, 4/255
while the lowest color in 6 bit would light up for 1U, but -> 1/63, which would be roughly 4/255
So there are colors which are way dimmer in 8 bit. This patter would continue if you look at the next colors, you tent to see missing dark colors in 6 bit.

On the otherhand, In 8 bit the brightest color is 128U+64U+...+1U = 255U -> 255/255 = always on.
In 6 bit the brightest color is 32U+...+1U = 63U -> 63/63 = always on = 255/255

Maybe I should move the color explanition to a discusson part of this repo. Although I am very sure about it, might someone confirm it?

And one last topic:
You should switch to the ESP32S3. Then the lib uses a different DMA part, is faster, uses less memory and has higher fps. And you could use the external PSRAM to load images or do canvas work

What do you mean by 'at a cost of really dropping fps'? This libs creates descriptors for each row and the discriptors are chained in a loop, well 2 loops one for each buffer. When you switch the buffer, the last discriptor of buffer A is changed, to point to the first discriptor of Buffer B. So the Esp does not wait and your drop in fps is, that you see the frame in the new buffer a little later. We talking about fractions of your fps (on average half of your mspf (milli seconds per frame). Well if you try to play a gif or a video with lets say 60 fps. You might run into the following issues: Framebuffer FPS below 60, images of you video are skipped. Framebuffer FPS exactly 60, some Images might be displayed twice, others will be skipped. Because the 60 fps can not be synchronised. Framebuffer FPS above 60, some Images are show multiple times. Framebuffer FPS above 120, some Images are show 2 times, other 3 times. Or even higher numbers I don't think one would notice that. Except you try to make a pixel perfect scroller. Like moving Text by 1 pixel every 5 frames. This is really hard to do perfect, I'm working on this myself. A Callback or flag would be helpfull, but I don't know about one yet. Insted I reduced the Color deph to increase the fps and minimze the problem.

I think if you set the I2S freq above 20mhz, the lib will use this Value and calculate the resulting FPS with this value, but the internal ESP driver is not capable to use any value(, so it chooses the closes one?). I am not sure about this part, maybe you can try to film it with your phone and try to veryfiy my or your thoughts. And when the esp is not overheating I don't think you can damage the esp so easily.

A 16 bit color is rrrrrggg gggbbbbb, thus you are missing out the lowest bit on red and blue. Next the lib calculates the 16bit rgb565 to rgb888 and then uses the highest 6 bits for the color deph. It you take a look at the formular, there also can be some unprecission in calculation, and for red and blue the lowes bit can not be set, resulting in 'missing' dark colors:

inline void MatrixPanel_I2S_DMA::color565to888(const uint16_t color, uint8_t &r, uint8_t &g, uint8_t &b) { r = (color >> 8) & 0xf8; g = (color >> 3) & 0xfc; b = (color << 3) & 0xf8; r |= r >> 5; g |= g >> 6; b |= b >> 5; }

     calculated      cut off

rgb565 -> rgb888 -> rgb666 1, 0, 1 -> 8, 4, 8 -> 2, 1, 2 2, 0, 2 -> 16, 8, 16 -> 4, 2, 4 3, 0, 3 -> 24, 12, 24 -> 6, 3, 6

Another point why dark colors are disappearing when you reduce the color bit depth is, that you are actually removing just the dark colors: Let's just look at one color and one led. And let's assume one led is always lit for 1 unit (U) (we are talking about mico seconds). In 8 bit, there are 255U for one color cycle. If the highest bit of the color is set, the first 128U the led is glowing. If the second bit is not set, the led is off for 64U, or on if it would be set. And so on. So the 2nd lowest bit would turn the led on for 2U and the lowest for 1U. In 6 bit, there are 63U in total. The hightest bit controlles 32 and the lowest 1U. In 6 bit the whole process is repeated around 4 times, while it is just repeated once in 8bit. So the lowest colors in 8 bit, would light up for 1U, 2U, 3U and 4U -> 1/255, 2/255, 3/255, 4/255 while the lowest color in 6 bit would light up for 1U, but -> 1/63, which would be roughly 4/255 So there are colors which are way dimmer in 8 bit. This patter would continue if you look at the next colors, you tent to see missing dark colors in 6 bit.

On the otherhand, In 8 bit the brightest color is 128U+64U+...+1U = 255U -> 255/255 = always on. In 6 bit the brightest color is 32U+...+1U = 63U -> 63/63 = always on = 255/255

Maybe I should move the color explanition to a discusson part of this repo. Although I am very sure about it, might someone confirm it?

And one last topic: You should switch to the ESP32S3. Then the lib uses a different DMA part, is faster, uses less memory and has higher fps. And you could use the external PSRAM to load images or do canvas work

Hello Luke

Thanks for the detailed explanation about the colours that is very helpful and explains what I am seeing.

So what I am seeing (and maybe I am doing something wrong?) but basically adding the 'dma_display->flipDMABuffer()' method causes the main loop to slow down so I am unable to process as many frames as I can without it, When I say frame I mean reading in a frame of data from the SPI bus then copying it to the display buffer via 'drawPixel' without 'dma_display->flipDMABuffer()' and double_buff set to false I can easily get about 90 frames per second displayed. But when I enable the double_buff and add back the 'dma_display->flipDMABuffer()' it drops to about 60fps almost as if this method is blocking or delaying the main loop somehow. I fully understand all it should be doing it swapping the DMA buffers pointers and this should be instant but for some reason in my tests it seems to really slow the main loop down for some reason.

I can see the same thing with very basic tests ie just drawing the text "test" then running this in the main loop with and without double buffering and the double buffering version is always a lot slower in the loop processing. (so not the refresh rate of the display, but the speed it can process the main loop() and process one frame of my image. Hope that sort of makes sense?

Ah I shall indeed grab a S3 dev board and run some more tests, thanks for the suggestion!

You are right, I somehow forgot about the wait.

here

You could try to remove this line of code. There might be some glitches introduced, if you write to a part of the framebuffer, which will be shown one more time.
When you first print to a Canvas and later print the whole canvas to the framebuffer, there shouldn't be an issue. At least if the panel framerate is not a lot slower than you graphics frame rate.

Or if you are a good programmer and know the ESP RTOS or Free RTOS quite well, you could try to solve the issue by using multiple Tasks and Semaphors

Removing the while has really helped and not seeing any weird artifacts yet, Thanks :)
I'll also try a S3 see if I can squeeze little more speed out of it :D

I don't think the buffer flip is ever going to be perfect without some low level programming and logic output analysis. Not something I have the means or time to investigate.

No problem it looks very good to me thank you :) Just the minor speed issue which is now fixed by removing the while loop.