Person detection model does not work on ESP32-WROVER-IE
Opened this issue · 9 comments
Hi,
I am using ESP32-WROVER-IE board with ESP32-MICROLITE-SPIRAM firmware flashed on it. I am trying to test the person detection model but it does not work. Based on results, predictions are just random numbers without any logic. Is there any solution for this issue or the model cannot work on MicroPython boards?
P.S. I have already check this issue here: tensorflow/tflite-micro#395. The issue is solved based on the status but in reality it's not correct.
The model is confirmed working on 2 boards where the camera module is integrated ESP32-CAM-MB and an M5 Timer Camera.
It also can run using the image files in the unix port.
Are you using the MICROLITE_SPIRAM_CAM firmware? It creates 2 ram regions: 2 mb for micropython heap and 2 mb for the frame buffer.
I suspect you are getting memory corruption because the micropython heap and the camera frame buffer are overlapping.
By setting the CONFIG_SPIRAM_USE_MALLOC=y flag the heap and frame buffer will be kept separate.
I tested two approaches. The first one is without a camera. I just used the test images that are already there in the person detection folder and I was able to get some correct results, but every time when I switch the orders of pictures, the result is different. For the second case, I used the OV2640 2MP Mini Camera connected to my board using the appropriate pins (which I tested and the camera works properly) but there I am not able to get any correct results. The possible issue can be with the size of the image and non-grayscale format, but as long as I know there is no specific library for image editing in MicroPython (the one that we for example have with the Arduino boards).
I tried the model on an esp32-cam board which uses the ov2640 camera and things worked nicely.
In fact, the ov2640 has a 96x96 pixel grey scale mode. I initialized the camera driver with:
camera.init(0,format=camera.GRAYSCALE,framesize=camera.FRAME_96X96)
and that did the trick.
Is it possible to run the same example on the esp32-WROVER-IE board but using the external ov2640 2MP Mini Plus camera and the camera driver you mentioned above? With the esp32-cam board, the camera is already integrated into the board, so all pins are defined in the example, but what in the case when the external module needs to be connected with the board?
I did not try because I don't have these modules. However, the pin connections are found in micropython-modules/micropython-camera-driver/modcamera.h
You will have to re-build tflite-micro-micropython if you change these settings. You may also use the file to make connections to the camera that conform to these settings
The init call can use specific pins with out the need to recompile:
camera.init(0,format=camera.GRAYSCALE,framesize=camera.FRAME_96X96, sioc=23,siod=25,xclk=27,vsync=22,href=26,pclk=21, d0=32,d1=35,d2=34,d3=5,d4=39,d5=18,d6=36,d7=19, reset=15)
The above is what the reference example uses for the M5 Timer Cam which has the ov3660 sensor (the board I have)
The camera driver seems to auto detect the sensor type and ov2640 is included in the MICROLITE_SPIRAM_CAM firmware already.
So I think it should work for your case. The number corresponds to the gpio pin number.
For example the sioc pin is gpio23.
Is it possible to use ESP32 together with Arduino IDE and use the exact same code for person detection that already works with Arduino boards? I am again thinking of switching to ESP32 as I saw that BLE Mesh, which I also need for my project, is supported by ESP32 and can be implemented using Arduino IDE.
I think the issue in this issue is that when you are getting started its not really clear how sensitive the machine learning models are to the inputs being exactly what it is expecting. i.e. exactly the same as what it was trained on.
In this case the model is expecting the 96x96 images 9216 bytes to be fed in but its not saying anything if you send a much larger image.
I think the enhancement to add here is a warning flagging when there is a mismatch between the bytes being captured from the camera and the size of the input tensor.
Its clear how to do it on the micropython side in the person detection example but it may also be possible to pass the array reference into the C side and then apply the check there is it can flag this scenario for all models.