JosephHewitt/wardriver_rev3

1.2.1 crashes related to GPS time sense code?

pejacoby opened this issue · 2 comments

I installed 1.2.1 current code on my WiSpy (HackerBoxes) original JHewitt unit using the Arduino IDE. Everything compiled and loaded fine. Since then, I am experiencing a curious crash loop ONLY when the unit powers up away from my main WiFi network.

When I power on at my house and the time syncs via NTP, everything runs perfectly.

When I power back on at my job downtown, and there is no network to connect to, I can see the new GPS time code is working (see image 1), with the unit displaying a date stamp before it shows GPS detection. Then, as I start moving and eventually get a full GPS detection (see image 2), I will at some point minutes later get an interrupt code 6 and the unit will reboot. It'll cycle back through the no-network-found startup, run for a while, and then crash again. The run time varies between crashes. On today's 45 minute drive home, I ended up with 13 files, some very small, some moderate in size. I also notice that at times, in a well populated area, the "WiFi" count gets stuck at "0", while the side B BLE count is actively changing.

I'm leaning to something in the GPS time detection changes. Has anyone else tried 1.2.1 on a base model? I have not yet tried it on my 5G mod unit.
1-TimeAndDate-no-GPS-lock
2-GPS-lock-Wifi-stuck-at-zero

Thanks for the report. I have reproduced this and will investigate the cause.

In my tests, this issue did not occur if the wardriver was not configured with a network. When I configured a network that the wardriver is unable to connect to, this crash happens regularly.

For my future reference:

E (168345) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (168345) task_wdt:  - IDLE0 (CPU 0)
E (168345) task_wdt: Tasks currently running:
E (168345) task_wdt: CPU 0: primary_scan_lo
E (168345) task_wdt: CPU 1: loopTask
E (168345) task_wdt: Aborting.
E (168345) task_wdt: Print CPU 0 (current core) backtrace




Backtrace: 0x4012d26d:0x3ffddd80 0x4012ea44:0x3ffdddb0 0x400fbe25:0x3ffdddd0 0x400df47d:0x3ffdddf0 0x400df09d:0x3ffdde10 0x400d403b:0x3ffdde30 0x400915fa:0x3ffddee0




ELF file SHA256: 4a07dba99cf431c8   

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:4832
load:0x40078000,len:16460
load:0x40080400,len:4
load:0x40080404,len:3504
entry 0x400805cc

decoded:



    0x4012d26d: st_word at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/src/ff.c:659
    0x4012d26d: f_sync at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/src/ff.c:4197
    0x4012ea44: vfs_fat_unlink at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/vfs/vfs_fat.c:669
    0x400fbe25: esp_vfs_stat at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/vfs/vfs.c:594
    0x400df47d: VFSFileImpl::position() const at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/FS/src/vfs_api.cpp:404
    0x400df09d: fs::File::peek() at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/FS/src/FS.cpp:85
    0x400d403b: primary_scan_loop(void*) at /home/runner/work/wardriver_rev3/wardriver_rev3/A/A.ino:2506
    0x400915fa: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

and

E (96609) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (96609) task_wdt:  - IDLE0 (CPU 0)
E (96609) task_wdt: Tasks currently running:
E (96609) task_wdt: CPU 0: primary_scan_lo
E (96609) task_wdt: CPU 1: loopTask
E (96609) task_wdt: Aborting. 
E (96609) task_wdt: Print CPU 0 (current core) backtrace




Backtrace: 0x400eb841:0x3ffdcbd0 0x400e0f91:0x3ffdcbf0 0x400e01a5:0x3ffdcc10 0x400e0318:0x3ffdcc50 0x400e0349:0x3ffdcc70 0x4012b22d:0x3ffdcca0 0x4012bb0e:0x3ffdccc0 0x4012d23d:0x3ffdcce0 0x4012ea44:0x3ffdcd10 0x400fbe25:0x3ffdcd30 0x400df47d:0x3ffdcd50 0x400d
f09d:0x3ffdcd70 0x400d403b:0x3ffdcd90 0x400915fa:0x3ffdce40




ELF file SHA256: 4a07dba99cf431c8

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:4832
load:0x40078000,len:16460
load:0x40080400,len:4
load:0x40080404,len:3504
entry 0x400805cc

decoded:

    0x400eb841: spiTransferBytesNL at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/cores/esp32/esp32-hal-spi.c:1337
    0x400e0f91: __static_initialization_and_destruction_0 at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/SPI/src/SPI.cpp:334
    0x400e0f91: _GLOBAL__sub_I__ZN8SPIClassC2Eh at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/SPI/src/SPI.cpp:334
    0x400e01a5: sdCommand(unsigned char, char, unsigned int, unsigned int*) at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/SD/src/sd_diskio.cpp:191
    0x400e0318: ff_sd_status(unsigned char) at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/SD/src/sd_diskio.cpp:599
    0x400e0349: sdReadSector(unsigned char, char*, unsigned long long) at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/SD/src/sd_diskio.cpp:256
    0x4012b22d: ff_disk_ioctl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/diskio/diskio.c:102
    0x4012bb0e: mount_volume at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/src/ff.c:3415
    0x4012d23d: f_sync at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/src/ff.c:4190
    0x4012ea44: vfs_fat_unlink at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/fatfs/vfs/vfs_fat.c:669
    0x400fbe25: esp_vfs_stat at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/vfs/vfs.c:594
    0x400df47d: VFSFileImpl::position() const at /home/runner/builder/packages/esp32/hardware/esp32/3.0.4/libraries/FS/src/vfs_api.cpp:404
    0x400d403b: primary_scan_loop(void*) at /home/runner/work/wardriver_rev3/wardriver_rev3/A/A.ino:2506
    0x400915fa: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

(edited to include decoded backtraces)

I have found the root cause of this issue. This version of the wardriver code now uses ESP32 board library 3.0.x which is a new major release and this has caused a change of behavior which I suspect may be a bug.

Upon failing to connect to WiFi, a flag is set indicating that a retry should be attempted. In version 2.x of the library, calling WiFi.disconnect(); would tell the underlying system to disconnect and not attempt retries. However, it seems this no longer has that affect.

By enabling verbose logging, I could see that the underlying system was constantly attempting to connect to my configured network despite also calling WiFi.disconnect(); in a loop.

In this state, WiFi scans were failing which I believe is normal behavior since a scan shouldn't be performed while in the process of establishing a WiFi connection. This meant that ESP-A was unable to scan at all and was stuck in a loop of failing to scan. I was also seeing watchdog timeouts caused by this which would explain why you also saw unexpected resets of the device.

I did occasionally see it get "unstuck" from that state allowing a few WiFi scans to work, but then generally it would go back to being stuck attempting to reconnect which might explain why you had different amounts of times between crashes.

I changed the setup_wifi function which is used to get the system ready for scanning to fully turn off WiFi and then turn it back on. By doing this, it fully resets everything so it never retries connections anymore. I haven't been able to reproduce the issue since. This is now merged in #180.

Please feel free to reopen this issue or open a new issue if you see this happening again. Thanks again for the report.