Fatal error: Indefinite abort() was called when updating the model

Question

Fatal error: Indefinite abort() was called when updating the model

Closed this issue a year ago · 9 comments

Steps to reproduce the issue:

Take latest master (commit df6497e)
Update the model (model.cc) with the pre-trained one from google.
Build and run on ESP-EYE

Result:
The console show this error:
`W (348) i2s(legacy): legacy i2s driver is deprecated, please migrate to use driver/i2s_std.h, driver/i2s_pdm.h or driver/i2s_tdm.h
I (361) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.

abort() was called at PC 0x400dd395 on core 1

Backtrace: 0x40081b66:0x3ffc3340 0x40085cf5:0x3ffc3360 0x4008a78a:0x3ffc3380 0x400dd395:0x3ffc33f0 0x400d8f82:0x3ffc3430 0x400dcb65:0x3ffc3470 0x400d7484:0x3ffc34a0 0x400d63a0:0x3ffc34d0 0x400d605b:0x3ffc3520

ELF file SHA256: 1ed82b9f08a48053

Rebooting...
ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:7312
load:0x40078000,len:16484
ho 0 tail 12 room 4
load:0x40080400,len:4260
entry 0x40080668
I (29) boot: ESP-IDF v5.0.1-464-gef4b1b7704-dirty 2nd stage bootloader
I (29) boot: compile time 15:50:04
I (30) boot: chip revision: v1.0
I (34) qio_mode: Enabling default flash chip QIO
I (39) boot.esp32: SPI Speed : 80MHz
I (44) boot.esp32: SPI Mode : QIO
I (48) boot.esp32: SPI Flash Size : 2MB
I (53) boot: Enabling RNG early entropy source...
I (58) boot: Partition Table:
I (62) boot: ## Label Usage Type ST Offset Length
I (69) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (77) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (84) boot: 2 factory factory app 00 00 00010000 00100000
I (92) boot: End of partition table
I (96) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=11f94h ( 73620) map
I (125) esp_image: segment 1: paddr=00021fbc vaddr=3ffb0000 size=01dfch ( 7676) load
I (127) esp_image: segment 2: paddr=00023dc0 vaddr=40080000 size=0bb7ch ( 47996) load
I (146) esp_image: segment 3: paddr=0002f944 vaddr=00000000 size=006d4h ( 1748)
I (147) esp_image: segment 4: paddr=00030020 vaddr=400d0020 size=291cch (168396) map
I (203) boot: Loaded app from partition at offset 0x10000
I (204) boot: Disabling RNG early entropy source...
I (215) cpu_start: Pro cpu up.
I (215) cpu_start: Starting app cpu, entry point is 0x400810a4
I (201) cpu_start: App cpu up.
I (232) cpu_start: Pro cpu start user code
I (232) cpu_start: cpu freq: 160000000 Hz
I (232) cpu_start: Application information:
I (237) cpu_start: Project name: micro_speech
I (242) cpu_start: App version: df6497e
I (247) cpu_start: Compile time: Aug 14 2023 15:12:57
I (253) cpu_start: ELF file SHA256: 1ed82b9f08a48053...
I (259) cpu_start: ESP-IDF: v5.0.1-464-gef4b1b7704-dirty
I (266) cpu_start: Min chip rev: v0.0
I (270) cpu_start: Max chip rev: v3.99
I (275) cpu_start: Chip rev: v1.0
I (280) heap_init: Initializing. RAM available for dynamic allocation:
I (287) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (293) heap_init: At 3FFBE490 len 00021B70 (134 KiB): DRAM
I (300) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (306) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (312) heap_init: At 4008BB7C len 00014484 (81 KiB): IRAM
I (319) spi_flash: detected chip: generic
I (323) spi_flash: flash io: qio
W (327) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
W (340) ADC: legacy driver is deprecated, please migrate to esp_adc/adc_oneshot.h
W (348) i2s(legacy): legacy i2s driver is deprecated, please migrate to use driver/i2s_std.h, driver/i2s_pdm.h or driver/i2s_tdm.h
I (361) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.

abort() was called at PC 0x400dd395 on core 1

Backtrace: 0x40081b66:0x3ffc3340 0x40085cf5:0x3ffc3360 0x4008a78a:0x3ffc3380 0x400dd395:0x3ffc33f0 0x400d8f82:0x3ffc3430 0x400dcb65:0x3ffc3470 0x400d7484:0x3ffc34a0 0x400d63a0:0x3ffc34d0 0x400d605b:0x3ffc3520

ELF file SHA256: 1ed82b9f08a48053

Rebooting...
ets Jun 8 2016 00:22:57`

Answer 1 · 2023-08-15T07:56:17.000Z

I trace the PC address where the abort() was triggered from, so it seems from the tensorflow library itself.

$ readelf --symbols micro_speech.elf | grep -i 400dd3
1759: 400d d324 49 FUNC GLOBAL DEFAULT 16 _ZN6tflite25Full[...]
2034: 400d d358 96 FUNC GLOBAL DEFAULT 16 _ZN6tflite29Calc[...]
2373: 400d d3b8 58 FUNC GLOBAL DEFAULT 16 _ZN6tflite5micro[...]
2511: 400d d3f4 34 FUNC GLOBAL DEFAULT 16 _ZN6tflite5micro[...]

From above, the function that trigger the abort should be tflite29Calc...() (name is cut short). So I tried to see this code, but it seems it came from statically linked library

$ grep -ir --color "tflite29Calc" *
examples/micro_speech/build/micro_speech.map: .literal._ZN6tflite29CalculateOpDataFullyConnectedEP13TfLiteContext21TfLiteFusedActivation10TfLiteTypePK12TfLiteTensorS6_S6_PS4_PNS_20OpDataFullyConnectedE
examples/micro_speech/build/micro_speech.map: .text._ZN6tflite29CalculateOpDataFullyConnectedEP13TfLiteContext21TfLiteFusedActivation10TfLiteTypePK12TfLiteTensorS6_S6_PS4_PNS_20OpDataFullyConnectedE
examples/micro_speech/build/micro_speech.map:                0x00000000400dd358                _ZN6tflite29CalculateOpDataFullyConnectedEP13TfLiteContext21TfLiteFusedActivation10TfLiteTypePK12TfLiteTensorS6_S6_PS4_PNS_20OpDataFullyConnectedE
examples/micro_speech/build/micro_speech.map:_ZN6tflite29CalculateOpDataFullyConnectedEP13TfLiteContext21TfLiteFusedActivation10TfLiteTypePK12TfLiteTensorS6_S6_PS4_PNS_20OpDataFullyConnectedE esp-idf/tflite-lib/libtflite-lib.a(fully_connected_common.cc.obj)
grep: examples/micro_speech/build/esp-idf/tflite-lib/libtflite-lib.a: binary file matches
grep: examples/micro_speech/build/esp-idf/tflite-lib/CMakeFiles/__idf_tflite-lib.dir/tensorflow/lite/micro/kernels/fully_connected_common.cc.obj: binary file matches
grep: examples/micro_speech/build/esp-idf/tflite-lib/CMakeFiles/__idf_tflite-lib.dir/tensorflow/lite/micro/kernels/lstm_eval_common.cc.obj: binary file matches
grep: examples/micro_speech/build/esp-idf/tflite-lib/CMakeFiles/__idf_tflite-lib.dir/tensorflow/lite/micro/kernels/esp_nn/fully_connected.cc.obj: binary file matches

Until now I have no success to change the model just for me to play around with different wake-up keyword. Any steps should be followed beside updating the model.cc and main_functions.cc (if there is new ops in MicroMutableOpResolver<>) ?

Answer 2 · 2023-08-16T05:47:49.000Z

@misterb0407
Thanks for reporting the issue.

I would like to highlight a point: The model you're using is older one. The newer models(especially from the quantisation perspective) are int8_t, however the model you've shared is uint8 optimised. The esp-nn optimisations will not work for this. The model however, should still work.

Other than that, the input and output sizes are same as expected by the model. Hence, there should not be any additional requirements.

Can you please

Use stable IDF branch (e.g., release/v5.1)
enable gdb_stub from the menuconfig (backtrace will be enable when crash happens)

And share the observations here?

Answer 3 · 2023-08-16T07:23:14.000Z

@vikramdattu , for step to enable gdb_stub, I open menuconfig GUI, but nothing happen when I enter it, when i open the file sdkconfig, this is all I see:

GDB Stub # # end of GDB Stub

Answer 4 · 2023-08-16T08:12:18.000Z

@vikramdattu , for step to enable gdb_stub, I open menuconfig GUI, but nothing happen when I enter it, when i open the file sdkconfig, this is all I see..

@misterb0407 You can enable gdbstub with following steps:
idf.py menuconfig > Component config > ESP System Settings > Panic Handler Behaviour > Select GDBStub on panic

Answer 5 · 2023-08-16T08:25:28.000Z

@misterb0407 I found the issue.
Filter zero point expected is 0, and hence your model aborts at the following line:

    // Filter weights will always be symmetric quantized since we only support
    // int8 quantization. See
    // https://github.com/tensorflow/tensorflow/issues/44912 for additional
    // context.
    TFLITE_DCHECK(filter->params.zero_point == 0);

Please use int8 quantisation to ensure the above condition holds. Looks like, uint8 support is now broken intentionally by tflite!: tensorflow/tflite-micro#216

Answer 6 · 2023-08-16T08:28:42.000Z

Hi @vikramdattu , thanks, got it.
So I tried as you suggested, to redo using idf version release/v5.1 with gdstub, but it seems can't find the .elf file
following is the snippet of log. I am using WSL2 + docker

I (379) main_task: Calling app_main()

abort() was called at PC 0x400dd559 on core 0

Backtrace: 0x40081786:0x3ffbc1f0 0x40085aad:0x3ffbc210 0x4008aa4a:0x3ffbc230 0x400dd559:0x3ffbc2a0 0x400d9344:0x3ffbc2e0 0x400dcd45:0x3ffbc320 0x400d77d0:0x3ffbc350 0x400d66fc:0x3ffbc380 0x400d636b:0x3ffbc3e0 0x4008812d:0x3ffbc400

ELF file SHA256: 4235189bef783505

Entering gdb stub now.
$T0b#e6xtensa-esp32-elf-gdb -ex set serial baud 115200 -ex target remote \.\COM3 \wsl$\Ubuntu-22.04/home/misterb/repo/tflite-micro-esp-examples/examples/micro_speech/build/micro_speech.elf: [WinError 2] The system cannot find the file specified
ets Jun 8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:7448
load:0x40078000,len:16476
ho 0 tail 12 room 4
load:0x40080400,len:4
load:0x40080404,len:4284
entry 0x40080668

Answer 7 · 2023-08-16T08:36:58.000Z

@misterb0407 not sure why this is not working for you. BTW, you don't need to enter anything manually to trigger the gdb when gdb_stub enabled. It will invoke gdb and load the .elf file for you on crash.

I have attached output from my terminal fro your reference:
micro_speech_uint8_model_crash.txt

Please find the solution to the issue here: #61 (comment)

Answer 8 · 2023-08-17T01:06:17.000Z

Thank you so much @vikramdattu , I think we can close this ticket then.
Anyway, do you know any steps to follow to create our own model? I want to add say "go" on top of "yes" and "no". I look around in tflm repo, but no success so far.

Btw, may I know your development setup? Are you using WSL? Or just native windows/linux?

Answer 9 · 2023-08-17T05:08:02.000Z

@misterb0407 no problem. I am using MacBook for my development. I have not tried with wsl. However, I will provide a link for your reference. Hope it adds some value: https://gist.github.com/abobija/2f11d1b2c7cb079bec4df6e2348d969f

You may refer this training guide to train the micro_speech model.