bitbank2/JPEGDEC

Mis-aligned columns with 1.4.0

Closed this issue · 36 comments

Before you create this issue
This library takes compressed JPEG data and outputs pixels to your callback functions. If your problem is related to LCD displays, SD cards, or any other issue that has nothing to do with this library, please stop here. If you need help with your private or commercial project, please email me (bitbank@pobox.com) to ask about paid consulting. Github issues are for submitting actual bugs.

Describe the bug
With 1.4.0, some columns on the screen are mis-aligned:
DSC_1319
To me it seems the four-pixel-columns are swapped in place, especially when comparing to a live picture:
live-picture

Expected image
test-screen
(a yellow line separating upper and lower triangle)

Additional context
The board is a MaTouch_ESP32-S3 Parallel TFT with Touch 3.5'' ILI9488 with 480x320 pixel screen.
JPEGDEC 1.3.1 works flawlessly, but as the speedup 1.3.1 -> 1.4.0 is tremendous, I'd like to see 1.4.0 work :)

As a side node:
I use JPEGs of about 5kb packed in a big file (similar to MJPEG) and with 1.3.1, I get around 19-20 FPS. With 1.4.0 I now see 38-39 FPS!
Thanks for that awesome improvement.

Target pixel format is RGB565_BE, set with jpeg.setPixelType(RGB565_BIG_ENDIAN)

The issue is already present in the initial commit of the SIMD-feature: https://github.com/bitbank2/JPEGDEC/tree/e8217b9ca892694afb197c7c6f9d5a77e543aa59

Thanks for the detailed bug report. I'll take a look today.

I just tested this on my MakerFabs 320x480 16-bit parallel with cap touch board, I used the same code as you (1.4.0) with the same options. It displays without the problem you see. I'm using my bb_spi_lcd library to control the display. Perhaps something is going on with your display library.
20240131_105552

P.S. 38-39FPS!!? that sounds amazing. In my limited testing, I only saw a 20-40% speedup, but then again, I'm using images recorded at a higher quality level. When the compressed data is small for a large image, more time is spent in the pixel color conversion step and benefits more from my improvement.

Very interesting. How does your jpegDraw routine look like?
I have it like this:

int jpegDraw(JPEGDRAW* pDraw)
{
    lcd.pushImage(pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight, pDraw->pPixels);
    return 1;
}

As the image dimensions match the screenSize and the decode-Call is made with (x=y=scale = 0), I expect not additional range checks here.

I tried the following:

// from https://lemire.me/blog/2019/07/03/a-fast-16-bit-random-number-generator/
uint16_t wyhash16_x = 0; 

uint32_t hash16(uint32_t input, uint32_t key) {
  uint32_t hash = input * key;
  return ((hash >> 16) ^ hash) & 0xFFFF;
}

uint16_t wyhash16() {
  wyhash16_x += 0xfc15;
  return hash16(wyhash16_x, 0x2ab);
}
int jpegDraw(JPEGDRAW* pDraw)
{
    const size_t size = sizeof(uint16_t) * pDraw->iHeight*pDraw->iWidth; 
    uint16_t *pixels = (uint16_t*)malloc(size);
    memset(pixels, wyhash16(), size);
    lcd.pushImage(pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight, pixels);
    free(pixels);
    return 1;
}

It basically renders every batch from JPEGDEC in a different color. As you can see, nothing distorts here:
DSC_1321

So I expect the lcd library (LovyanGFX) to work properly. As it does with JPEGDEC 1.3.1 or lvgl 9.0.
Any hint from your side where I can look at?

Try my latest bb_spi_lcd library with this sketch for Arduino:

#include <bb_spi_lcd.h>
#include <JPEGDEC.h>
//#include "jpeg_11_128x128.h"
//#include "jpeg_22_128x128.h"
#include "fail.h"

JPEGDEC jpeg;
BB_SPI_LCD lcd;

int drawMCUs(JPEGDRAW pDraw)
{
// int iCount;
// iCount = pDraw->iWidth * pDraw->iHeight; // number of pixels to draw in this call
// Serial.printf("Draw pos = %d,%d. size = %d x %d\n", pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight);
lcd.pushImage(pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight, pDraw->pPixels, DRAW_TO_LCD | DRAW_WITH_DMA);
return 1; // returning true (1) tells JPEGDEC to continue decoding. Returning false (0) would quit decoding immediately.
} /
drawMCUs() */

void setup()
{
Serial.begin(115200);
while (!Serial) {};
lcd.begin(DISPLAY_MAKERFABS_S3);
// lcd.begin(DISPLAY_M5STACK_ATOMS3);
// lcd.begin(DISPLAY_M5STACK_CORE2);
lcd.fillScreen(TFT_BLACK);
// delay(4000);
} /* setup() */

void loop() {
long lTime;
char szTemp[64];

if (jpeg.openFLASH((uint8_t *)fail, sizeof(fail), drawMCUs))
// if (jpeg.openFLASH((uint8_t *)jpeg_11_128x128, sizeof(jpeg_11_128x128), drawMCUs))
{
Serial.println("Successfully opened JPEG image");
// Serial.printf("Image size: %d x %d, orientation: %d, bpp: %d\n", jpeg.getWidth(),
// jpeg.getHeight(), jpeg.getOrientation(), jpeg.getBpp());
jpeg.setPixelType(RGB565_BIG_ENDIAN); // The SPI LCD wants the 16-bit pixels in big-endian order
// jpeg.setMaxOutputSize(1);
lTime = micros();
// Draw the thumbnail image in the middle of the display (upper left corner = 120,100) at 1/4 scale
if (jpeg.decode(0,0,0))
{
lTime = micros() - lTime;
sprintf(szTemp, "Successfully decoded image in %d us", (int)lTime);
Serial.println(szTemp);
}
jpeg.close();
}

delay(10000); // repeat every 10 seconds
}

As I have a parallel display, your library would not get me far (and for the parallel callbacks, I would still need LovyanGFX, so no win).
But I printed the second line. With all LovyganGFX commented out, I see the perfect value in the buffers as expected.

Is there anything I can do to disable the SIMD stuff in JPEGDEC? Or does something special need to be initialized that LovyganGFX overwrites?

My code DOES support parallel displays. I'm using the same hardware as you. I made it extremely easy for popular hardware by creating "named displays" which hide all the details of the GPIO numbers and settings. Install my bb_spi_lcd library and run the demo I created above.

I can confirm this works. Thank you very much!
Bonus points for the fast compile times!
I added touch support from your bb_captouch (FT6236) library and the same issue appeared again!
So somehow the i2c is to be blamed for this. Can you confirm this on your end?

In case you need the Pins:

    constexpr int I2C_PORT_NUM = I2C_NUM_0;
    constexpr int I2C_PIN_SDA = 38;
    constexpr int I2C_PIN_SCL = 39;
    constexpr int I2C_PIN_INT = 40;
    touch.init(I2C_PIN_SDA, I2C_PIN_SCL, 0 /* not used */, I2C_PIN_INT);

Interesting... I'll try to find the issue.

I'm also on it, here is a minimal example with the issue:

#define SPI_DRIVER_SELECT 0

// SD card interface
#include <SPI.h>
#include <SdFat.h>
#define SPI_CLOCK SD_SCK_MHZ(50)
#define SD_CONFIG SdSpiConfig(SD_CS, USER_SPI_BEGIN | DEDICATED_SPI, SPI_CLOCK)
SdFat SD;

// display jpegs
#include <JPEGDEC.h>
static JPEGDEC jpeg;
File globalFile; // loading image

// lcd and touch
#include <bb_captouch.h>
#include <bb_spi_lcd.h>
BBCapTouch touch;
BB_SPI_LCD lcd;

const unsigned int screenWidth = 480;
const unsigned int screenHeight = 320;

// PIN configuration
#define SPI_MOSI 2
#define SPI_MISO 41
#define SPI_SCK 42
#define SD_CS 1
#define LCD_CS 37
#define LCD_BLK 45
#define I2C_SCL 39
#define I2C_SDA 38


// draw JPEG portion to screen
int jpegDraw(JPEGDRAW* pDraw)
{
    lcd.pushImage(pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight, pDraw->pPixels, DRAW_TO_LCD);
    return 1;
}

// JPEG helpers for loading image
void * jpegOpen(const char *filename, int32_t *size) {
  globalFile = SD.open(filename);
  *size = globalFile.size();
  return &globalFile;
}

void jpegClose(void *handle) {
  if (globalFile) globalFile.close();
}

int32_t jpegRead(JPEGFILE *handle, uint8_t *buffer, int32_t length) {
  if (!globalFile) return 0;
  return globalFile.read(buffer, length);
}

int32_t jpegSeek(JPEGFILE *handle, int32_t position) {
  if (!globalFile) return 0;
  return globalFile.seek(position);
}

void displayLoadingScreen()
{
  const char loadingScreenJPEG[] = "/loading.jpg";
  if(SD.exists(loadingScreenJPEG))
  {
    jpeg.open(loadingScreenJPEG, jpegOpen, jpegClose, jpegRead, jpegSeek, jpegDraw);
    jpeg.setPixelType(RGB565_BIG_ENDIAN);
    jpeg.decode(0, 0, 0);
    jpeg.close();
  }
}

// setup
void setup() {
    pinMode(LCD_CS, OUTPUT);
    pinMode(LCD_BLK, OUTPUT);

    digitalWrite(LCD_CS, LOW);
    digitalWrite(LCD_BLK, HIGH);

#ifdef WITH_SERIAL
    Serial.begin(115200);
    delay(200);
    Serial.println("hello from board");
    Serial.flush();
#endif

    lcd.begin(DISPLAY_MAKERFABS_S3);
    lcd.setRotation(3);

    SPI.begin(SPI_SCK, SPI_MISO, SPI_MOSI);
    if (!SD.begin(SD_CONFIG))
        SD.initErrorHalt(&Serial);

    displayLoadingScreen();
    constexpr int I2C_PIN_SDA = 38;
    constexpr int I2C_PIN_SCL = 39;
    constexpr int I2C_PIN_INT = 40;
    int error = 0;
    // comment out next line and the picture is perfect, otherwise it has 4 pixel wide artifacts
    error = touch.init(I2C_PIN_SDA, I2C_PIN_SCL);
    Serial.println(error);
    Serial.flush();
    while(1);
}

void loop() {

  delay(100);

}

Put a file on the SD card named 'loading.jpg' and it will display it.
Comment out line error = touch.init(I2C_PIN_SDA, I2C_PIN_SCL); and it is perfect. With the line (which appears after the displaying!), the image is distorted.
I copied the stuff from your library into the sketch and ran in manually, which sometimes gave me a "Stack smashing protect failure! " error message and reboots - but the image was fine!
I don't get it at the moment, especially the part that image is distorted before the touch.init is called. If touch.init is commented out, the image is fine?!

I added bb_captouch initialization to my example and it had no effect. Have you tried loading the image from SD vs from FLASH? If I had to guess, the distorted image makes me think that the D/C line is unstable for some reason. When pixels get written to the wrong spot it's usually because data is being interpreted as commands.

Yes, it is the same when loading via jpeg.openFLASH. Also when I disable and remove all SPI & SD code.

Do you have any other ESP32-S3 devices to try? CoreS3? AtomS3? FeatherS3+external LCD? I don't know what to tell you.

Good morning!
I do not have any other S3 device.
I thought I give it a shot and try from your end, using the code you posted and adding touch there. Same result with the destroyed columns:

#include <bb_captouch.h>
#include <bb_spi_lcd.h>
#include <JPEGDEC.h>
//#include "jpeg_11_128x128.h"
//#include "jpeg_22_128x128.h"
#include "loading.h"

JPEGDEC jpeg;
BB_SPI_LCD lcd;
BBCapTouch touch;

int drawMCUs(JPEGDRAW *pDraw)
{
// int iCount;
// iCount = pDraw->iWidth * pDraw->iHeight; // number of pixels to draw in this call
// Serial.printf("Draw pos = %d,%d. size = %d x %d\n", pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight);
lcd.pushImage(pDraw->x, pDraw->y, pDraw->iWidth, pDraw->iHeight, pDraw->pPixels, DRAW_TO_LCD | DRAW_WITH_DMA);
return 1; // returning true (1) tells JPEGDEC to continue decoding. Returning false (0) would quit decoding immediately.
} /* drawMCUs() */

void setup()
{
Serial.begin(115200);
while (!Serial) {};
lcd.begin(DISPLAY_MAKERFABS_S3);
// lcd.begin(DISPLAY_M5STACK_ATOMS3);
// lcd.begin(DISPLAY_M5STACK_CORE2);
lcd.fillScreen(TFT_BLACK);


    constexpr int I2C_PIN_SDA = 38;
    constexpr int I2C_PIN_SCL = 39;
    constexpr int I2C_PIN_INT = 40;
    int error = touch.init(I2C_PIN_SDA, I2C_PIN_SCL);
// delay(4000);
} /* setup() */

void loop() {
long lTime;
char szTemp[64];

if (jpeg.openFLASH((uint8_t *)loading, sizeof(loading), drawMCUs))
// if (jpeg.openFLASH((uint8_t *)jpeg_11_128x128, sizeof(jpeg_11_128x128), drawMCUs))
{
Serial.println("Successfully opened JPEG image");
// Serial.printf("Image size: %d x %d, orientation: %d, bpp: %d\n", jpeg.getWidth(),
// jpeg.getHeight(), jpeg.getOrientation(), jpeg.getBpp());
jpeg.setPixelType(RGB565_BIG_ENDIAN); // The SPI LCD wants the 16-bit pixels in big-endian order
// jpeg.setMaxOutputSize(1);
lTime = micros();
// Draw the thumbnail image in the middle of the display (upper left corner = 120,100) at 1/4 scale
if (jpeg.decode(0,0,0))
{
lTime = micros() - lTime;
sprintf(szTemp, "Successfully decoded image in %d us", (int)lTime);
Serial.println(szTemp);
}
jpeg.close();
}
  TOUCHINFO ti;
  if (touch.getSamples(&ti)) 
  {
    if(ti.count > 0)
    {
      Serial.print("touch");
    }
  }

delay(10000); // repeat every 10 seconds
}

What does this example show on your end?
As for the build settings, do they differ?
image
image

If we cannot solve this, I'd like to ask if you could please add a #define to disable the SIMD code. So I can use always the newest version of your library. 19 FPS is close enough for my task.

The same code doesn't fail for me. I have almost the same build settings - flash size is set to 4MB even though it might be 16MB on the module. I'm using Arduino IDE 2.2.1 and ESP32 support v2.0.10 (newer versions are available, but they seemed to be less reliable). Maybe you have a slightly different board. Mine is the MakerFabs ESP32-S3 Parallel TFT with touch v1.0.

Yup, all the same on my end.

I found it!
I cannot explain it at the moment, but the issue seems to be with the 16-byte alignment of JPEGIMAGE.usPixels:
In the code of yours, I need to set uint8_t bogus[4] and for my actual Sketch I need no bogus at all.

I changed my JPEGIMAGE structure to ensure that usPixels is 16-byte aligned. Is that somehow getting disturbed by something in the sketch?

typedef struct alignas(16) jpeg_image_tag and removing bogus fixes it for both cases.
cpp reference for alignas

I do not disturb the setup, but I do not know how the compiler sets up the data structures.

Whatever changes I make to fix this, it has to compile on C99.

typedef struct jpeg_image_tag {...} __attribute__ ((aligned (16))) JPEGIMAGE; also works with GCC. For Visual Studio, there might be other tags to be used.

I've thought about that, but it's specific to GCC. Then I have to #ifdef the compiler (e.g. I use LLVM too). That's why I added the double double hack. Any other ideas? Why doesn't my double-double hack alignment work for you?

I am more astonished that the alignment is different for different Sketches.
If the compiler puts the variable on 4 byte alignment (standard for the ESP32 S3, right?), it has four positions to put the variable on and only one fits the bounding of 16 bytes together with the usPixels located correctly.
And I am not aware of a way to tell the compiler on variable declaration on where to put it.

You can have usPixels[PIXELS+12] with a offset to find the right 16 byte alignment, but this seems very hacky as well.

Or you malloc the usPixels and work on the aligned part only (but store the original pointer for freeing later).

The PIXELS+12 idea is probably the most trouble-free way to do it. I'll have to add a uint16_t *pAlignedPixels member variable which has the correct address.

I'd propose to add:

uint16_t *pPixelsBuffer[PIXELS+12];
uint16_t *usPixels;

and in the constructor align it:

const uint16_t _align = 16;
_jpeg.usPixels = (pPixelsBuffer+_align)-((pPixelsBuffer+_align) % _align); // in Excel, this works

This way, your remaining code stays the same.

good idea - I'll make the change
p.s. use & instead of % when working with powers of 2 - you never know if the target platform has a divide instruction.

Right, just be aware that you do not shoot behind the +12 which is easily done.
Guess the pointer correction only needs to be done if pointer % alignment != 0.

Not sure what you mean by "shoot behind". If the pointer is not aligned to 16 bytes, it will need an offset added of 0 to 12.
int i = (int)thePixels & 15;
if (i == 0) i = 16;
usPixels = &thePixels[(16 - i)>>1];

lgtm.

hmm, now that I coded the fix, I see the failure you've been seeing :(
ooh - now it's the MCU source data that's out of alignment :(

@danie1kr I just pushed the fix; please give it a try.

Both sketches now run perfectly.
Thank you very much for your patience :)
Guess we both learned something on this.
Plus you got a new user for your lcd+touch library!

You're welcome and thank you for patiently working through the issue too. It's hard to know how many people use my libraries. I'm going to keep sharing with the open source community, but it's always good to get feedback, even a bug report :).

And this was quite a tricky one. I even was in doubt if my hardware is fine...