[BUG] Random LoadProhibited when publishing command callback
Closed this issue · 7 comments
The crash will occur after around 20000000 ticks, and happens when trying to send a new publish to AWS IoT. The program is set to publish to AWS IoT every 30 seconds.
The error occurs when performing xTaskNotify, which is coming from the example function provided in this repo.
System information
- Hardware board: ESP32-PICO-MINI
- Operating System: Linux
Expected behavior
A clear description of the expected behavior.
Screenshots or console output
I (21102850) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (21102850) coreMQTT: State record updated. New state=MQTTPublishDone.
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400964de PS : 0x00060f33 A0 : 0x800e1b51 A1 : 0x3ffe7170
0x400964de: xTaskGenericNotify at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5718
A2 : 0x0000cdcd A3 : 0x3ffbdc38 A4 : 0x80096c4c A5 : 0x00000003
A6 : 0x00000000 A7 : 0x00000031 A8 : 0x800964ce A9 : 0x0000cecd
A10 : 0x00000001 A11 : 0x3ffc3638 A12 : 0x00060f20 A13 : 0x00060f23
A14 : 0x0000000f A15 : 0x0000cdcd SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x0000cf21 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffc
Backtrace: 0x400964db:0x3ffe7170 0x400e1b4e:0x3ffe7190 0x400e7919:0x3ffe71b0 0x400e7b50:0x3ffe71f0 0x400e9a11:0x3ffe7210 0x400e9b51:0x3ffe7290 0x400e7e69:0x3ffe72b0 0x400dbea8:0x3ffe7300 0x40096c49:0x3ffe7320
0x400964db: xTaskGenericNotify at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5718
0x400e1b4e: prvPublishCommandCallback at /Users/test/esp-git/esp-matter/examples/im/outlet/main/aws_iot/fleet_prov_by_claim_demo.c:380
0x400e7919: concludeCommand at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:808
0x400e7b50: handleAcks at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:638
(inlined by) mqttEventCallback at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:700
0x400e9a11: handlePublishAcks at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1555
(inlined by) handleIncomingAck at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1601
(inlined by) receiveSingleIteration at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1756
0x400e9b51: MQTT_ProcessLoop at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:3065
0x400e7e69: processCommand at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:611
(inlined by) MQTTAgent_CommandLoop at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059
0x400dbea8: prvMQTTAgentTask at /Users/test/esp-git/esp-matter/examples/im/outlet/main/aws_iot/networking/mqtt/core_mqtt_agent_manager.c:521
0x40096c49: vPortTaskWrapper at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:154
static void prvPublishCommandCallback( MQTTAgentCommandContext_t * pxCommandContext,
MQTTAgentReturnInfo_t * pxReturnInfo )
{
/* Store the result in the application defined context so the task that
* initiated the publish can check the operation's status. */
pxCommandContext->xReturnStatus = pxReturnInfo->returnCode;
if( pxCommandContext->xTaskToNotify != NULL )
{
/* Send the context's ulNotificationValue as the notification value so
* the receiving task can check the value it set in the context matches
* the value it receives in the notification. */
xTaskNotify( pxCommandContext->xTaskToNotify,
pxCommandContext->ulNotificationValue,
eSetValueWithOverwrite ); # LINE 380
}
}
static void prvPublishToTopic( MQTTQoS_t xQoS,
char * pcTopicName,
char * pcPayload )
{
uint32_t ulPublishMessageId, ulNotifiedValue = 0;
MQTTStatus_t xCommandAdded;
BaseType_t xCommandAcknowledged = pdFALSE;
MQTTPublishInfo_t xPublishInfo = { 0 };
MQTTAgentCommandContext_t xCommandContext = { 0 };
MQTTAgentCommandInfo_t xCommandParams = { 0 };
xTaskNotifyStateClear( NULL );
/* Create a unique number for the publish that is about to be sent.
* This number is used in the command context and is sent back to this task
* as a notification in the callback that's executed upon receipt of the
* publish from coreMQTT-Agent.
* That way this task can match an acknowledgment to the message being sent.
*/
xSemaphoreTake( xMessageIdSemaphore, portMAX_DELAY );
{
++ulMessageId;
ulPublishMessageId = ulMessageId;
}
xSemaphoreGive( xMessageIdSemaphore );
/* Configure the publish operation. The topic name string must persist for
* duration of publish! */
xPublishInfo.qos = xQoS;
xPublishInfo.pTopicName = pcTopicName;
xPublishInfo.topicNameLength = ( uint16_t ) strlen( pcTopicName );
xPublishInfo.pPayload = pcPayload;
xPublishInfo.payloadLength = ( uint16_t ) strlen( pcPayload );
/* Complete an application defined context associated with this publish
* message.
* This gets updated in the callback function so the variable must persist
* until the callback executes. */
xCommandContext.ulNotificationValue = ulPublishMessageId;
xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();
xCommandParams.blockTimeMs = fleetprovbyclaimconfigMAX_COMMAND_SEND_BLOCK_TIME_MS;
xCommandParams.cmdCompleteCallback = prvPublishCommandCallback;
xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;
do
{
/* Wait for coreMQTT-Agent task to have working network connection and
* not be performing an OTA update. */
xEventGroupWaitBits( xNetworkEventGroup,
CORE_MQTT_AGENT_CONNECTED_BIT | CORE_MQTT_AGENT_OTA_NOT_IN_PROGRESS_BIT,
pdFALSE,
pdTRUE,
portMAX_DELAY );
ESP_LOGI( TAG,
"Sending publish request to coreMQTT-Agent with message \"%s\" on topic \"%s\" with ID %ld.",
pcPayload,
pcTopicName,
ulPublishMessageId );
/* To ensure ulNotification doesn't accidentally hold the expected value
* as it is to be checked against the value sent from the callback.. */
ulNotifiedValue = ~ulPublishMessageId;
xCommandAcknowledged = pdFALSE;
xCommandAdded = MQTTAgent_Publish( &xGlobalMqttAgentContext,
&xPublishInfo,
&xCommandParams );
if( xCommandAdded == MQTTSuccess )
{
/* For QoS 1 and 2, wait for the publish acknowledgment. For QoS0,
* wait for the publish to be sent. */
ESP_LOGI( TAG,
"Waiting for publish %ld to complete.",
ulPublishMessageId );
xCommandAcknowledged = prvWaitForNotification( &ulNotifiedValue );
}
else
{
ESP_LOGE( TAG,
"Failed to enqueue publish command. Error code=%s",
MQTT_Status_strerror( xCommandAdded ) );
}
/* Check all ways the status was passed back just for demonstration
* purposes. */
if( ( xCommandAcknowledged != pdTRUE ) ||
( xCommandContext.xReturnStatus != MQTTSuccess ) ||
( ulNotifiedValue != ulPublishMessageId ) )
{
ESP_LOGW( TAG,
"Error or timed out waiting for ack for publish message %ld. Re-attempting publish.",
ulPublishMessageId );
}
else
{
ESP_LOGI( TAG,
"Publish %ld succeeded.",
ulPublishMessageId );
}
// @note newly added
/* Add a little randomness into the delay so the tasks don't remain
* in lockstep. */
vTaskDelay( pdMS_TO_TICKS( 125 ) +
( rand() % 0xff ) );
} while( ( xCommandAcknowledged != pdTRUE ) ||
( xCommandContext.xReturnStatus != MQTTSuccess ) ||
( ulNotifiedValue != ulPublishMessageId ) );
}
Thank you.
Thank you for reporting this problem. We will look into this problem and discuss with you here.
@law-ko
Can you help to provide the following information?
- Your esp-idf version
- Your iot-reference-esp32c3 tag or commit
- The application code which calls
prvPublishToTopic
Your esp-idf version: v5.0.1
Your iot-reference-esp32c3 tag or commit: latest
The application code which calls prvPublishToTopic:
char *jsonString = cJSON_PrintUnformatted(rootJSON);
prvPublishToTopic( xQoS, pcPubTopicBuffer, (char *) jsonString );
We have adjusted the code into our app that work with AWS IoT with desired and reported style. This is uncommon error to see and looking deeper into the code we don't know what was causing this.
The exception you shared indicates a LoadProhibited exception in the tasks.c. From the tasks.c in v5.0.1, the xTaskToNotify is passed by parameters in prvPublishCommandCallback. It looks like xTaskToNotify points to an invalid address.
BaseType_t xTaskGenericNotify( TaskHandle_t xTaskToNotify,
UBaseType_t uxIndexToNotify,
uint32_t ulValue,
eNotifyAction eAction,
uint32_t * pulPreviousNotificationValue )
{
...
ucOriginalNotifyState = pxTCB->ucNotifyState[ uxIndexToNotify ]; /* This line has LoadProhibited exception. */xTaskNotify(xTaskGenericNotify) is called in the prvPublishCommandCallback and is a member variable of input parameter pxCommandContext.
pxCommandContext points to a local variable xCommandContext in prvPublishToTopic. Therefore, it is on the stack of the task which calls prvPublishToTopic.
static void prvPublishToTopic( MQTTQoS_t xQoS,
char * pcTopicName,
char * pcPayload )
{
...
MQTTAgentCommandInfo_t xCommandParams = { 0 };
...
xCommandContext.ulNotificationValue = ulPublishMessageId;
xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();
xCommandParams.blockTimeMs = subpubunsubconfigMAX_COMMAND_SEND_BLOCK_TIME_MS;
xCommandParams.cmdCompleteCallback = prvPublishCommandCallback;
xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;Base on these information, we can troubleshoot the following possible causes:
-
Stack overflow
Local variablexCommandContextinprvPublishToTopicmay be corrupted if there is stack overflow. FreeRTOS supports stack overflow detection mechanism. You can reference this page to enable stack overflow checking. -
Use after free
TheprvPublishCommandCallbackis called from another task. If the task callsprvPublishToTopicexited, the stack memory will be used by other task. ThepxCommandContextmay point to memory modified by other task. We would like to suggest to check the status of the task which callsprvPublishToTopic.
We also suggest you to run the sub_pub_unsub_demo only on your platform to see if this problem can be reproduced and feedback the result. This can help us to narrow down the problem.
@chinglee-iot Would it be possible that this crash is related to the MQTT_ProcessLoop being blocked like xEventGroupWaitBit?
@law-ko
MQTT_ProcessLoop is blocked by transport interface receive function in receiveSingleIteration and is only called in the prvMQTTAgentTask.
static MQTTStatus_t receiveSingleIteration( MQTTContext_t * pContext,
bool manageKeepAlive )
{
....
recvBytes = pContext->transportInterface.recv( pContext->transportInterface.pNetworkContext,
&( pContext->networkBuffer.pBuffer[ pContext->index ] ),
pContext->networkBuffer.size - pContext->index );
Could you help to provide the xEventGroupWaitBits call in question so we can discuss it with you here?
Closing this issue due to inactivity. Feel free to re-open the issue if it persists.