Missing columns in RGB565 but present in RGB8888

Question

Missing columns in RGB565 but present in RGB8888

toneck opened this issue 9 months ago · 18 comments

When trying to convert an image using the user provided buffer it works ok for the RGB8888, but has missing columns for RGB565. After some testing it happens when using the draw callback as well.

To Reproduce

clone repo
modify linux/main.c: ucPixelType = RGB565_BIG_ENDIAN //or little endian
convert a file
image with column mising

also some columns are switched. Notably columns that start at multiple of 128 on the x. Column 640->512 -> 384 -> 256 -> 128 ->0

Expected behavior
A clear image as it appears with RGB8888

Additional context
I was using your library to convert some images on an ESP32-cam since last year. (it was significantly faster than the provided library from Espressif) I needed RGB888 so I modified the code to produce RGB888 instead of RGB565.
I noticed in January that you added RGB888 and tried the linux/main example. I saw you added the option of supplying a buffer from the user. Hoping to skip the drawcallback I tried your example on pc first and it worked. I wanted to test my code with rgb565 to see if I get a higher fps so I tried to change from RGB8888 to RGB565 and it kept giving me empty columns. I tried using the callback function to save the data to a buffer and save that to a file. The resulted file had no missing columns.
I tried looking over the jpeg.inl to locate the problem, but I didn't notice anything wrong. I presumed I was doing something wrong so I stayed with the callback and RGB8888.
I noticed the issue #69 and thought it may be similar, but the latest fix didn't work for me.

Today I decided to try using a callback function again, but now I see the columns when using the callback and when using the buffer. I'm not sure if before I did something different or if this is a new bug.
To test that I didn't do anything wrong, I tried a few previous commits, using the same callback function that just saves everything into a buffer. All tests are done with the tulips header image.
Commit e6293f0 works as intended.
Commit 6dd0375 segfaults after a few rows of decoding the image in RGB565.
Commit f6a46aa outputs columns.

Also, there is no difference between RGB 565 little or big endian for the bugged commits, they both output little endian. This problem doesn't appear in e6293f0.

Answer 1 · 2024-03-13T13:32:12.000Z

After trying some more commits, 2dbb61c already has this bug, which is the next commit after e6293f0 (which doesn't).
Therefore the bug was introduced in 2dbb61c

Answer 2 · 2024-03-13T14:49:50.000Z

Are you running the code on an ESP32 or ESP32-S3? If the S3, then it looks like a memory alignment issue with the output buffer.

Answer 3 · 2024-03-13T15:27:12.000Z

I also think it's an alignment issue.

I didn't have my esp on hand, so I ran it on my phone in termux (aarch64)

That being said, I am quite sure I had this problem on my laptop (64bit intel) about a month ago.

I'll be able to test it on the esp in about 2 hours

Answer 4 · 2024-03-13T19:33:39.000Z

Ok, so, I was able to test it both on my laptop (amd64) and on my esp32.
It appears to work on both of them. The problem seems to be only on aarch64.
I am curious if it would appear on a raspberry pi.

Answer 5 · 2024-03-13T22:06:07.000Z

Thanks for narrowing it down; I'll investigate.

Answer 6 · 2024-03-18T07:16:43.000Z

Fixed the NEON code. Please confirm.

Answer 7 · 2024-03-19T09:17:51.000Z

I didn't have time yesterday, but I managed to test it today.
The columns are not missing anymore.
Thanks for the help

On the other hand, the bug with the RGB565 not outputing big endian is still present.
I did some tests and this bug affects just decoding at full scale.
Decoding at half, quarter or eighth will output little endian for RGB565_LITTLE_ENDIAN and big endian for RGB565_BIG_ENDIAN
Decoding at full size will always output little endian, even for RGB565_BIG_ENDIAN.

Answer 8 · 2024-03-19T09:18:26.000Z

Should I close this bug and open a different bug for that one?

Answer 9 · 2024-03-19T09:57:15.000Z

I can fix that issue here too. I'll get to it today.

Answer 10 · 2024-03-19T18:16:51.000Z

Please give it a try now

Answer 11 · 2024-03-21T09:03:00.000Z

Thank you for your help, unfortunately, it gives a compilation error

In file included from main.c:11:
../src/jpeg.inl:3565:26: error: incompatible type for argument 1 of ‘vrev16q_u8’
 3565 |               vrev16q_u8(u168Temp);
      |                          ^~~~~~~~
      |                          |
      |                          uint16x8_t
In file included from ../src/jpeg.inl:56,

in jpeg.inl:

         u88B = vqrshrun_n_s16(i168B, 4); // shift right, narrow and saturate >
         u168Temp = vshll_n_u8(u88R, 8); // place red in upper part of 16-bit >
         u168Temp2 = vshll_n_u8(u88G, 8); // shift green elements to top of 16>
         u168Temp = vsriq_n_u16(u168Temp, u168Temp2, 5); // shift green elemen>
         u168Temp2 = vshll_n_u8(u88B, 8); // shift blue elements to top of 16->
         u168Temp = vsriq_n_u16(u168Temp, u168Temp2, 11); // shift blue elemen>
         if (ucPixelType == RGB565_BIG_ENDIAN) { // reverse the bytes
             vrev16q_u8(u168Temp);
         }
         vst1q_u16((uint16_t *)pOutput, u168Temp); // top left block
         // top right block
         i168Temp = vqdmulhq_lane_s16(i168Crx2.val[1], i164Constants, 0); // C>
         i168R = vaddq_s16(i168Temp, i168Y); // now we have 8 R values
         i168Temp = vqdmulhq_lane_s16(i168Crx2.val[1], i164Constants, 1); // C>
         u88R = vqshrun_n_s16(i168R, 4); // narrow and saturate to 8-bit unsig>
         i168G = vaddq_s16(i168Y, i168Temp);

Answer 12 · 2024-03-21T09:14:11.000Z

interesting; on MacOS that doesn't give a warning or error. I'll fix it - thanks.

Answer 13 · 2024-03-21T09:19:42.000Z

I think the problem is that the variable is declared as uint16x8_t
uint16x8_t u168Temp, u168Temp2;
it appears that vrev16q_u8 expects a uint8x16_t : link

Answer 14 · 2024-03-21T09:43:59.000Z

yes, of course I see the issue, but some compilers allow you to pass 128-bit registers without making a fuss about the implied type.

Answer 15 · 2024-03-21T10:02:28.000Z

Here is the gcc that I use if it helps:

I use a chroot ubuntu environment in termux, this is the gcc verision that I use

I tried the gcc compiler that is present in termux (not in the chroot) and it compiles, but the bigendian output is still little endian.

Answer 16 · 2024-03-23T10:35:42.000Z

ok, I added the requisite vreinterpretq_xx_xx wrappers. Please let me know if your compiler accepts it now.

Answer 17 · 2024-03-25T23:06:31.000Z

Is the fix working for you?

Answer 18 · 2024-05-10T09:37:12.000Z

Hi again,
Sorry for the very late reply. I got busy and forgot.

I just checked all three formats RGB565 little and BIG endian and RGB888 for all scalings and they work as they should.

Thank you for your help