bitbank2/JPEGDEC

Add SIMD support for ESP32S3

modi12jin opened this issue · 17 comments

This passage was generated by chatGPT

ESP32-S3 is a high-performance, low-power microcontroller that supports SSE (SIMD) instruction set, which can complete multiple operations in one instruction, improving code efficiency and running speed. Here is a sample code using SIMD instructions on ESP32-S3:

#include "xtensa/hal.h"
#include "esp_simd.h"
#include <stdio.h>

void simd_example(void)
{
    SIMDDATA s1 = {1.0f, 2.0f, 3.0f, 4.0f};
    SIMDDATA s2 = {5.0f, 6.0f, 7.0f, 8.0f};
    SIMDDATA s3 = {0.0f, 0.0f, 0.0f, 0.0f};

    // Add the numbers in each element of s1 and s2, and store the result in s3
    s3 = SIMD_ADD(s1, s2);

    // output each element in s3
    printf("%f, %f, %f, %f\r\n", s3.f32[0], s3.f32[1], s3.f32[2], s3.f32[3]);

    // Multiply the numbers in each element of s1 and s2, and store the result in s3
    s3 = SIMDMUL(s1, s2);

    // output each element in s3
    printf("%f, %f, %f, %f\r\n", s3.f32[0], s3.f32[1], s3.f32[2], s3.f32[3]);
}

In this sample code, the SIMD_ADD() and SIMDMUL() functions are functions that use SIMD instructions to complete addition and multiplication operations, and the SIMDDATA type is a pointer, which is used to point to a vector array containing 4 floating-point number elements. Using these functions can greatly improve code efficiency and execution speed.

It should be noted that to use SIMD instructions on ESP32-S3, you need to include <xtensa/hal.h> header files and <esp_simd.h> header files, and use the -msimd option to enable SIMD instructions when compiling set support.

Maybe this passage will help you

espressif/idf-extra-components#106

I'm quite familiar with SIMD coding and would be happy to optimize my code for the S3, but I can't find the include files you referenced above. Do you have a working Github link to them?

@bitbank2 Thank you for your reply, maybe the directory file name has been changed, causing the address he gave to be invalid

https://github.com/espressif/esp-adf-libs/tree/master/esp_codec/include/codec

I can't find the header file esp_simd.h either, maybe this question helps

espressif/esp-idf#7745

https://github.com/espressif/esp-dsp

I saw on twitter that they have introduced SIMD instructions in the technical reference manual

https://mobile.twitter.com/eMbeddedHome/status/1570520252123062274

https://mobile.twitter.com/lovyan03/status/1622846385438720002

I saw these references months ago, but no concrete examples. I thought you had new information. I'll keep searching for this info and when it actually becomes available, I'll implement it. For now, writing in ESP32 assembly language is not going to happen.

Many thanks! Looking forward to your work.

@bitbank2 Contact Espressif's official staff, he said that there seems to be no fully open version of the SIMD data.
There may be some clues hidden in esp-dsp.

I would honestly like to work on this, but I have very little free time. It will need to be painless and well documented.

@bitbank2 It should be possible to call the DSP like this from the Arduino.

espressif/esp-dsp#11

espressif/arduino-esp32#7710

#include <Arduino.h>
#include "dsps_biquad_gen.h" 

void setup() {
  Serial.begin(115200);
  float coeffs[15]={0},f=0.4,qFactor=4;
dsps_biquad_gen_lpf_f32(coeffs, f,  qFactor);

for (int i=0;i<15;i++){
  Serial.printf("%f \n",coeffs[i]);
}
}

void loop() {
}

This DSP API library has been around for several years. It MAY be optimized for SIMD, but still doesn't really help any of my work.

@bitbank2 Sorry to bother you again! I got new news that this component supports SIMD.
Officials told me that this only supports whole frame decoding and cannot be divided into blocks, and if you use this, the buffer used to decode seems to need to be 16-byte aligned.

https://github.com/espressif/esp-dev-kits/tree/master/esp32-s3-lcd-ev-board%2Fexamples%2Fusb_camera_lcd%2Fcomponents%2Fesp_jpeg

Components ported from ESP_ADF

Unfortunately not helpful because they didn't release the source code.

@bitbank2 Sorry to bother you again, this may not be helpful, but I wanted to tell you the test results.

JPEG decoding with SIMD, currently the whole frame, cannot be partial, there will be more in the future

The only thing that needs attention is that the buffer must be 16-byte aligned. I tested 320240 with a box and it took an average of 42 ms to decode RGB565. The performance on Arduino is really not good. I remember that decoding 800480 under IDF took less than 50ms.

JPEGDEC seems to be 68ms

https://github.com/esp-arduino-libs/ESP32_JPEG/blob/master/examples/DecodeTest/DecodeTest.ino

I worked on this over the weekend and got some good results optimizing my JPEG decoder. I'll publish the code soon.
What I find strange is your first comment on this issue - you show instructions, include files, and things that don't actually exist in the ESP32-S3 instruction set. The SIMD instructions (according to Espressif's own documentation), only support integer operations and are somewhat limited. I'm continuing my search for more info, but so far the SIMD of the S3 is mostly disappointing.

@bitbank2 This is jpeg SIMD decoding, which is now partially supported. Sir, you can try it and see how it works

7_20231124_jpeg_block_decoder_esp32s3.zip

I'm not interested in someone else's closed source code.