aws/aws-iot-device-sdk-python-v2

Reception of shadow updated messages is sometimes delayed relative to publishing

kubieduber opened this issue · 6 comments

Describe the bug

May not be related, but experiencing an issue similar to the one described in aws/aws-iot-device-sdk-python#244

I have observed instances where an update to the shadow is not immediately received by a subscriber. One instance where the delay was measured was almost exactly 9 minutes after the message was published.

Expected Behavior

Updates to the shadow should always deliver update accepted messages to subscribers with a negligible delay.

Current Behavior

Shadow updates are sometimes delayed by multiple minutes before an update accepted callback is triggered.

I will upload logs the next time I am able to reproduce this issue.

Reproduction Steps

The subscriber code is about as simple as it can be. Not sure if there's any additional configuration that would make this less likely.

I have not been able to identify a specific cause for the delay, though the messages tend to be received without delay if another message was received recently.

from awscrt import mqtt
from awsiot import iotshadow, mqtt_connection_builder

mqtt_connection = mqtt_connection_builder.mtls_from_path(
    ...
)

connection_future = mqtt_connection.connect()
connection_future.result()

client = iotshadow.IotShadowClient(mqtt_connection)


def subscribe():
    (update_accepted_future, _) = client.subscribe_to_update_named_shadow_accepted(
        request=iotshadow.UpdateNamedShadowSubscriptionRequest(
            shadow_name="Status", thing_name="ThingName"
        ),
        qos=mqtt.QoS.AT_LEAST_ONCE,
        callback=on_update_accepted,
    )
    update_accepted_future.result()

    print("Connection to AWS IoT established")

    while True:
        pass

def on_update_accepted(response):
    print(response)

subscribe()

Possible Solution

No response

Additional Information/Context

I'm updating the shadow through boto3 in a Lambda written in python, but that shouldn't have an effect because I can confirm the shadow is updated in these instances and the corresponding MQTT messages are published.

SDK version used

1.15.3

Environment details (OS name and version, etc.)

Raspberry Pi 4 - Raspbian Bullseye

Reproduced instance:

Lambda was triggered at 2023-07-02T19:24:40Z, which triggered the shadow update at 2023-07-02T19:24:40Z (immediately).

This is the log output from the last message prior to the shadow update being published until the message was actually received:

[DEBUG] [2023-07-02T19:12:19Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status
[DEBUG] [2023-07-02T19:27:47Z] [f580c440] [task-scheduler] - id=0xe6dea0: Running mqtt_ping task with <Running> status
[DEBUG] [2023-07-02T19:27:47Z] [f580c440] [task-scheduler] - id=0xe6dea0: Scheduling mqtt_ping task for future execution at time 77944293623405
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xe6dea0: Running mqtt_ping task with <Running> status
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [mqtt-client] - id=0xe6dd50: Starting ping
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xd9efe8: Scheduling mqtt_outgoing_request_task task for immediate execution
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [mqtt-client] - id=0xe6dd50: Starting ping with packet id 35
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xe6dea0: Scheduling mqtt_ping task for future execution at time 79144365992226
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xd9efe8: Running mqtt_outgoing_request_task task with <Running> status
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xf4f283a8: Scheduling mqtt_pingresp_timeout task for future execution at time 77947366229226
[DEBUG] [2023-07-02T19:32:19Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f283a8: Running mqtt_pingresp_timeout task with <Running> status
[ERROR] [2023-07-02T19:32:22Z] [f580c440] [mqtt-client] - id=0xe6dd50: ping timeout detected
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01d80: Scheduling channel_shutdown task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01d80: Running channel_shutdown task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: beginning shutdown process
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xf4f127f8 shutdown in read dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [tls-handler] - id=0xf4f129b0: Shutting down read direction with error code 5129
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xf4f129b0 shutdown in read dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xe6dd6c shutdown in read dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01c58: Scheduling (null) task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01c58: Running (null) task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xe6dd6c shutdown in write dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [tls-handler] - id=0xf4f129b0: Scheduling delayed write direction shutdown
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f12ac8: Scheduling s2n_delayed_shutdown task for future execution at time 77947368497521
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f12ac8: Running s2n_delayed_shutdown task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [tls-handler] - id=0xf4f129b0: Delayed shut down in write direction
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xf4f129b0 shutdown in write dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: closing
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01080: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Canceled> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f12858: Scheduling socket_handler_close task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01080: Running epoll_event_loop_unsubscribe_cleanup task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f12858: Running socket_handler_close task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: handler 0xf4f127f8 shutdown in write dir completed.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01c58: Scheduling (null) task for immediate execution
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f01c58: Running (null) task with <Running> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: during shutdown, canceling task 0xe6dea0
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xe6dea0: Running mqtt_ping task with <Canceled> status
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel-bootstrap] - id=0xb72a98: channel 0xf4f01c40 shutdown with error 5129.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [mqtt-client] - id=0xe6dd50: connection was unexpected interrupted, switch state to RECONNECTING.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [mqtt-client] - id=0xe6dd50: Connection interrupted, calling callback and attempting reconnect
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [task-scheduler] - id=0xf4f26d60: Scheduling mqtt_reconnect task for future execution at time 77952374187169
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel] - id=0xf4f01c40: destroying channel.
[DEBUG] [2023-07-02T19:32:22Z] [f580c440] [channel-bootstrap] - id=0xb72a98: releasing bootstrap reference
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f26d60: Running mqtt_reconnect task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel-bootstrap] - id=0xb72a98: acquiring bootstrap reference
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [dns] - id=0xd08570: Host resolution requested for a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [dns] - id=0xd08570: No cached entries found for a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com starting new resolver thread.
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolving host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 35.167.200.40
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 52.88.214.137
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 52.88.145.67
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 54.190.131.51
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 52.13.217.193
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 44.226.15.160
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 44.242.91.213
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 35.166.134.116
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::2cef:1f1b
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::3425:1713
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::2cf0:5705
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::3418:8047
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::3428:1d8a
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: resolved record: 2620:108:700f::23a4:8389
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static, resolving host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com successful, returned 14 addresses
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 35.167.200.40 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 52.88.214.137 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 52.88.145.67 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 54.190.131.51 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 52.13.217.193 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 44.226.15.160 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 44.242.91.213 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 35.166.134.116 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::2cef:1f1b for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::3425:1713 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::2cf0:5705 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::3418:8047 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::3428:1d8a for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static: new address resolved 2620:108:700f::23a4:8389 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com caching
[DEBUG] [2023-07-02T19:32:27Z] [f3aff440] [dns] - static, invoking resolution callback for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com with 2 addresses
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b066a8: Scheduling attempt_connection task for immediate execution
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b06790: Scheduling attempt_connection task for immediate execution
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b066a8: Running attempt_connection task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: initializing with domain 1 and type 0
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: setting socket options to: keep-alive 0, keep idle 0, keep-alive interval 0, keep-alive probe count 0.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: beginning connect.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: connecting to endpoint 2620:108:700f::2cef:1f1b:443.
[ERROR] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: connect failed with error code 101.
[INFO] [2023-07-02T19:32:27Z] [f580c440] [dns] - id=0xd08570: recording failure for record 2620:108:700f::2cef:1f1b for a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com, moving to bad list
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [dns] - static: purging address 2620:108:700f::2cef:1f1b for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: is still open, closing...
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: closing
[ERROR] [2023-07-02T19:32:27Z] [f580c440] [channel-bootstrap] - id=0xb72a98: failed to create socket with error 1049
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b06790: Running attempt_connection task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: initializing with domain 0 and type 0
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: setting socket options to: keep-alive 0, keep idle 0, keep-alive interval 0, keep-alive probe count 0.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: beginning connect.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: connecting to endpoint 35.167.200.40:443.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b05840: Scheduling (null) task for future execution at time 77957405392796
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f21bf0: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[INFO] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: connection success
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket] - id=0xf4f00a90 fd=9: assigning to event loop 0xcff890
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel-bootstrap] - id=0xb72a98: client connection on socket 0xf4f00a90 completed with error 0.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel] - id=0xf4f01c40: Beginning creation and setup of new channel.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b06318: Scheduling on_channel_setup_complete task for immediate execution
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f21bf0: Running epoll_event_loop_unsubscribe_cleanup task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf3b06318: Running on_channel_setup_complete task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel] - id=0xf4f01c40: setup complete, notifying caller.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel] - id=0xf4f01c40: message pool 0xf4f010b8 found in event-loop local storage: using it.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel-bootstrap] - id=0xb72a98: channel 0xf4f01c40 setup succeeded: bootstrapping.
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [socket-handler] - id=0xf4f237e8: Socket handler created with max_read_size of 16384
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f129c8: Scheduling tls_timeout task for future execution at time 77962446272018
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [tls-handler] - id=0xf4f129b0: Alpn protocol negotiated as x-amzn-mqtt-ca
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [tls-handler] - id=0xf4f129b0: Remote server name is a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [channel-bootstrap] - id=0xb72a98: tls negotiation result 0 on channel 0xf4f01c40
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [mqtt-client] - id=0xe6dd50: Connection successfully opened, sending CONNECT packet
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [task-scheduler] - id=0xf4f13178: Scheduling mqtt_connack_timeout task for future execution at time 77955502259703
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [mqtt-client] - id=0xe6dd50: MQTT Connection initializing CONNECT packet for client-id '<thing name>'
[DEBUG] [2023-07-02T19:32:27Z] [f580c440] [mqtt-client] - id=0xe6dd50: Adding username ?SDK=PythonV2&Version=1.15.3 to connection
[DEBUG] [2023-07-02T19:32:28Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status
[DEBUG] [2023-07-02T19:32:28Z] [f580c440] [mqtt-client] - id=0xe6dd50: connection was accepted, switch state from 2 to CONNECTED.
[DEBUG] [2023-07-02T19:32:28Z] [f580c440] [task-scheduler] - id=0xe6dea0: Scheduling mqtt_ping task for future execution at time 79152705450184
[DEBUG] [2023-07-02T19:32:31Z] [f580c440] [task-scheduler] - id=0xf4f13178: Running mqtt_connack_timeout task with <Running> status
[DEBUG] [2023-07-02T19:32:32Z] [f580c440] [task-scheduler] - id=0xf3b05840: Running (null) task with <Running> status
[DEBUG] [2023-07-02T19:32:37Z] [f580c440] [task-scheduler] - id=0xf4f129c8: Running tls_timeout task with <Running> status
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: Either no requests have been made for an address for a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com for the duration of the ttl, or this thread is being forcibly shutdown. Killing thread.
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::3418:8047 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::3428:1d8a for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::2cf0:5705 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::3425:1713 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::23a4:8389 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 52.88.145.67 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 44.242.91.213 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 44.226.15.160 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 52.13.217.193 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 35.166.134.116 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 54.190.131.51 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 52.88.214.137 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 35.167.200.40 for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:32:57Z] [f3aff440] [dns] - static: purging address 2620:108:700f::2cef:1f1b for host a3q8lgnw4q8xzy-ats.iot.us-west-2.amazonaws.com from the cache due to cache eviction or shutdown
[DEBUG] [2023-07-02T19:33:23Z] [f580c440] [task-scheduler] - id=0xd9efe8: Scheduling mqtt_outgoing_request_task task for immediate execution
[DEBUG] [2023-07-02T19:33:23Z] [f580c440] [mqtt-client] - id=0xe6dd50: Starting publish 36 to topic $aws/things/<thing name>/shadow/name/State/update
[DEBUG] [2023-07-02T19:33:23Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:33:23Z] [f580c440] [task-scheduler] - id=0xd9efe8: Running mqtt_outgoing_request_task task with <Running> status
[DEBUG] [2023-07-02T19:33:23Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status
[DEBUG] [2023-07-02T19:33:24Z] [f580c440] [mqtt-client] - id=0xe6dd50: Publish 36 complete
[DEBUG] [2023-07-02T19:33:24Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Scheduling socket_written_task task for immediate execution
[DEBUG] [2023-07-02T19:33:24Z] [f580c440] [task-scheduler] - id=0xf4f00bd8: Running socket_written_task task with <Running> status

I'm not quite able to reconcile the log timestamps (19:xx) and the event timestamp (12:xx) you supply in the description. Are we missing some kind of time zone difference?

It's also not clear (from the log) when you received the message. Looking at the log instrumentation, I feel like there are some not-good choices for log level in the client's logging (way too much trace, which is painfully noisy).

Either way, the attached log shows a period of non-connectivity of uncertain length that doesn't get detected until the pingreq times out at 2023-07-02T19:32:22Z

Updated the comment. 12:xx was PST and 19:xx was UTC. Listed both timestamps as UTC now.

The publish 36 message at 2023-07-02T19:33:23Z would happen immediately after the message is received.

Is there any way to diagnose what would've caused a period of non-connectivity? The system was connected during that time. Anything I can try to prevent disconnects or have them get mitigated faster?

You can lower your keep alive interval setting on the mqtt connection, but that can be a double-edged sword: you detect dead connections quicker but you use more bandwidth and if your settings are too strict (timeouts set too low) you can cause instability as well.

You can also try TCP keep alive, as controlled by the socket options (https://github.com/awslabs/aws-crt-python/blob/v0.16.21/awscrt/io.py#L209-L242) used when making the connection (https://github.com/awslabs/aws-crt-python/blob/v0.16.21/awscrt/mqtt.py#L343). TCP keep alive is cheaper from a bandwidth perspective because it uses fewer round trips and isn't framed/buffered by TLS.

"Invisible" disconnections are often a result of an issue at an intermediate link like a router or gateway between two endpoints. Depending on the function, a hardware reset can completely orphan the two endpoints and they remain oblivious until some ping mechanism determines the connection has been dropped.

Thanks, I guess other examples I've seen have just employed a lower keep alive interval to mitigate issues like this.

I'll also try TCP keep alive and hopefully between the two, I can find a solution that works for me. Appreciate the advice.

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.