FreeRTOS/iot-reference-esp32

[BUG] OTA Timer handle NULL for Timerid=0, can't stop.

Closed this issue · 5 comments

txf- commented

Describe the bug
on an ESP32 When OTA is starting on the log I get the following:

I (56687) ota_over_mqtt_demo: Subscribed to topic $aws/things/mything/jobs/notify-next.
I (57867) ota_over_mqtt_demo: Sent PUBLISH packet to broker $aws/things/mything/jobs/$next/get to broker.
W (57867) AWS_OTA: OTA Timer handle NULL for Timerid=0, can't stop.
I (57867) ota_over_mqtt_demo: Received job message callback, size 59.
W (57887) ota_over_mqtt_demo: Received an unhandled callback event from OTA Agent, event = 8
I (57917) ota_over_mqtt_demo: OTA Event processing completed. Freeing the event buffer to pool.

Why is there a NULL Timer reference, and is it a problem? I can't seem to detect one functionally with the ota task, however I might be missing something important.

System information

  • Hardware board: - Esp32 WROVER (v3) custom board.
  • IDE used: VSCode with IDF extension
  • Operating System: [ Windows 11 WSL2 with Docker image ]
  • Code version: 202212.00 release, IDF 4.4.3
  • Project/Demo: ota_over_mqtt_demo

Expected behavior
I don't know if it is expected or not.

Steps to reproduce bug
Note that I changed the TLS timeout to be lower at 700ms, however this happens regardless of what the settings are.

Code to reproduce bug
Run the demo on an ESP32.

Hi @txf-,
Lowering the TLS timeout does not affect this so don't worry about that :). This warning springs from this line https://github.com/aws/ota-for-aws-iot-embedded-sdk/blob/e42ac54b0fb5128eeba35923da27258b917160fe/source/portable/os/ota_os_freertos.c#L311, and the message indicates that Timer:0 is NULL so nothing can be stopped. You could place an assert or abort in the same block as this message and the device should print out a stack trace of what is calling this function when the timer is NULL - that will make it easier for me to identify a fix, but this seems to just be a warning and non-fatal. Note that this function is passed through a struct to the OTA library so calls to it from the library will look like https://github.com/aws/ota-for-aws-iot-embedded-sdk/blob/e42ac54b0fb5128eeba35923da27258b917160fe/source/ota.c#L1127. If you could give me the stack trace though, that would help :).
Best,
Jason Carroll

Oh should probably mention that you can find ota_os_freertos.c in components/esp-aws-iot/libraries/ota/ota/source/portable/os.

txf- commented

If you could give me the stack trace though, that would help :). Best, Jason Carroll

Here you go:

assert failed: OtaStopTimer_FreeRTOS ota_os_freertos.c:288 (otaTimer[ otaTimerId ])


Backtrace: 0x40081ece:0x3ffd5d50 0x40089dad:0x3ffd5d70 0x4008f2c5:0x3ffd5d90 0x4017d91f:0x3ffd5eb0 0x4017b060:0x3ffd5ed0 0x4017a610:0x3ffd5f00 0x4017a8a4:0x3ffd5f40 0x400da491:0x3ffd5f60
0x40081ece: panic_abort at /opt/esp/idf/components/esp_system/panic.c:408

0x40089dad: esp_system_abort at /opt/esp/idf/components/esp_system/esp_system.c:137

0x4008f2c5: __assert_func at /opt/esp/idf/components/newlib/assert.c:85

0x4017d91f: OtaStopTimer_FreeRTOS at /workspaces/omni-b-2-2/components/esp-aws-iot/libraries/ota-for-aws-iot-embedded-sdk/port/ota_os_freertos.c:288 (discriminator 1)

0x4017b060: requestJobHandler at /workspaces/omni-b-2-2/components/esp-aws-iot/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/ota.c:901

0x4017a610: executeHandler at /workspaces/omni-b-2-2/components/esp-aws-iot/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/ota.c:2928
 (inlined by) receiveAndProcessOtaEvent at /workspaces/omni-b-2-2/components/esp-aws-iot/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/ota.c:3005

0x4017a8a4: OTA_EventProcessingTask at /workspaces/omni-b-2-2/components/esp-aws-iot/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/ota.c:3049

0x400da491: prvOtaAgentTask at /workspaces/omni-b-2-2/main/tasks/ota/ota_over_mqtt.c:506

Note that my ota_over_mqtt is slightly differerent so the line numbers might not match, but there aren't any changes that should affect this. This backtrace doesn't really tell me much about the problem, let me know if you need any other capture.

Hi @txf-,
Thanks for your information.
I think that's a normal behavior when device can subscribe/publish message successfully at the first try. You can refer to https://github.com/aws/ota-for-aws-iot-embedded-sdk/blob/main/source/ota.c#L901, we stop the timer but we never start it before (because we do it successfully at the beginning). It won't impact any flow for OTA.

Thanks.

I'm going to close this issue if there is no further question.
Please feel free to re-create it if any discussion needed.

Thanks.