Azure/azure-iot-cli-extension

[question/bug] `az iot hub monitor-events` stops directly after startup

Closed this issue · 10 comments

Describe the bug
When running the az iot hub monitor-events -n <my-iothub-name> --device-id <my-devide-id> --content-type application/json --timeout 10 --properties app it will stop directly after execution without any Error. While running the same command with --debug flag, I can see only one error: cli.azext_iot.monitor.telemetry: No Event Hub partitions found to listen on. or. i this the only hint I can give for now.
I already found a workaround for it. Now I will not use anymore the -n flag, instead I use --login 'HostName=XXXX.azure-devices.net;SharedAccessKeyName=XXXX;SharedAccessKey=XXXX'.

So maybe you could help me, why the first command isn't working anymore. Any help is appreciated.

To Reproduce
Steps to reproduce the behavior:

  1. Open Bash
  2. az login (to login into your azure account)
  3. az iot hub monitor-events -n <my-iothub-name> --device-id <my-devide-id> --content-type application/json --timeout 10 --properties app
  4. Monitoring will start but stops directly after

Expected behavior
IotHub monitoring (for a given device-id) is started and will provide all incoming messages. Are there no messages within 10 seconds (after a received message) the monitoring will stop.

Screenshots
/
``
Environment:

  • OS: [Ubuntu 22.04.4 LTS]
  • Shell: [bash zsh]
  • Az CLI version: [2.61.0]
  • IoT extension version: [0.23.1]
  • Python version (if pip installed): [3.10.12]

Additional context
/

Thanks @nilskosek for bringing this to our attention. I did notice a single similar failure this morning from our nightly pipelines, I'm running another few tests to determine if this is just a service interruption or if it's a bug we need to patch.

Thank you also for providing a workaround - in the meantime would you mind trying again with -n to see if you can still reproduce the issue?

Hey @c-ryan-k,
yes I can still reproduce the issue.

I can confirm this bug with that same error message.

@c-ryan-k this is a critical problem, because I have to use this command several times a day for my development work.

For me az iot hub monitor-events broke permanently with that error message for my development Azure IoT Hub, while I was developing a Python script that uses Azure SDK for Python's azure-eventhub package to capture events from that hub. It still works fine for the other two Azure IoT Hubs I have access to.

Of course the problem might actually be the Azure Event Hub compatible endpoint on the Azure IoT Hub side. But the command doesn't print out any details what's actually wrong with the endpoint.

@nilskosek thanks for pointing out the workaround. I can confirm that with --login instead of --subscription the command works again.

Hi,

I will be helping out @c-ryan-k. We unfortunately cannot reproduce your error still and would like some more information:

  1. Can you provide us the output of az iot hub show (please remove hub name, resource group, subscription, and any other fields that would have personal information).

  2. What Shared Access Policies do you have?

Thanks!

Output from az iot hub show:

{
  "etag": "CgAAACAAAAA=",
  "id": "<REDACTED>",
  "identity": {
    "principalId": "<REDACTED>",
    "tenantId": "<REDACTED>",
    "type": "SystemAssigned",
    "userAssignedIdentities": null
  },
  "location": "northeurope",
  "name": "<REDACTED>",
  "properties": {
    "allowedFqdnList": [],
    "authorizationPolicies": null,
    "cloudToDevice": {
      "defaultTtlAsIso8601": "1:00:00",
      "feedback": {
        "lockDurationAsIso8601": "0:01:00",
        "maxDeliveryCount": 10,
        "ttlAsIso8601": "1:00:00"
      },
      "maxDeliveryCount": 10
    },
    "comments": null,
    "deviceStreams": null,
    "disableDeviceSas": null,
    "disableLocalAuth": null,
    "disableModuleSas": null,
    "enableDataResidency": null,
    "enableFileUploadNotifications": false,
    "encryption": null,
    "eventHubEndpoints": {
      "events": {
        "endpoint": "sb://iothub-ns-<REDACTED>-iot-<REDACTED>.servicebus.windows.net/",
        "partitionCount": 4,
        "partitionIds": [],
        "path": "<REDACTED>",
        "retentionTimeInDays": 1
      }
    },
    "features": "RootCertificateV2",
    "hostName": "<REDACTED>.azure-devices.net",
    "ipFilterRules": [],
    "ipVersion": null,
    "locations": [
      {
        "location": "North Europe",
        "role": "primary"
      },
      {
        "location": "West Europe",
        "role": "secondary"
      }
    ],
    "messagingEndpoints": {
      "fileNotifications": {
        "lockDurationAsIso8601": "0:01:00",
        "maxDeliveryCount": 10,
        "ttlAsIso8601": "1:00:00"
      }
    },
    "minTlsVersion": null,
    "networkRuleSets": null,
    "privateEndpointConnections": null,
    "provisioningState": "Succeeded",
    "publicNetworkAccess": null,
    "restrictOutboundNetworkAccess": null,
    "rootCertificate": {
      "enableRootCertificateV2": true,
      "lastUpdatedTimeUtc": "2023-05-30T19:03:54.730159+00:00",
      "microsoftMigratedRootCert": false
    },
    "routing": {
      "endpoints": {
        "cosmosDbSqlContainers": [],
        "eventHubs": [],
        "serviceBusQueues": [
          {
            "authenticationType": null,
            "connectionString": "<REDACTED>",
            "endpointUri": null,
            "entityPath": null,
            "id": "<REDACTED>",
            "identity": null,
            "name": "status-sb-queue",
            "resourceGroup": "<REDACTED>",
            "subscriptionId": "<REDACTED>"
          }
        ],
        "serviceBusTopics": [
          {
            "authenticationType": null,
            "connectionString": "<REDACTED>",
            "endpointUri": null,
            "entityPath": null,
            "id": "<REDACTED>",
            "identity": null,
            "name": "measuremet-topic",
            "resourceGroup": "<REDACTED>",
            "subscriptionId": "<REDACTED>"
          }
        ],
        "storageContainers": []
      },
      "enrichments": null,
      "fallbackRoute": {
        "condition": "true",
        "endpointNames": [
          "events"
        ],
        "isEnabled": true,
        "name": "$fallback",
        "source": "DeviceMessages"
      },
      "routes": [
        {
          "condition": "true\n",
          "endpointNames": [
            "measuremet-topic"
          ],
          "isEnabled": true,
          "name": "measurement-to-sb-topic",
          "source": "DeviceMessages"
        },
        {
          "condition": "true",
          "endpointNames": [
            "events"
          ],
          "isEnabled": true,
          "name": "To_default_endpoint",
          "source": "DeviceMessages"
        }
      ]
    },
    "state": "Active",
    "storageEndpoints": {
      "$default": {
        "authenticationType": "keyBased",
        "connectionString": "",
        "containerName": "",
        "identity": null,
        "sasTtlAsIso8601": "1:00:00"
      }
    }
  },
  "resourcegroup": "<REDACTED>",
  "sku": {
    "capacity": 1,
    "name": "S1",
    "tier": "Standard"
  },
  "subscriptionid": "<REDACTED>",
  "systemData": {
    "createdAt": "2021-01-18T08:35:56.487000+00:00",
    "createdBy": null,
    "createdByType": null,
    "lastModifiedAt": null,
    "lastModifiedBy": null,
    "lastModifiedByType": null
  },
  "tags": {},
  "type": "Microsoft.Devices/IotHubs"
}

Output from az iot hub policy list:

[
  {
    "keyName": "iothubowner",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryWrite, ServiceConnect, DeviceConnect",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "service",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryWrite, ServiceConnect",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "device",
    "primaryKey": "<REDACTED>",
    "rights": "DeviceConnect",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "registryRead",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryRead",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "registryReadWrite",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryWrite, ServiceConnect",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "backend_access",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryWrite, ServiceConnect",
    "secondaryKey": "<REDACTED>"
  },
  {
    "keyName": "antenna-tests",
    "primaryKey": "<REDACTED>",
    "rights": "RegistryRead, ServiceConnect",
    "secondaryKey": "<REDACTED>"
  }
]

The command az iot hub monitor-events chooses the policy iothubowner:

cli.azure.cli.core.sdk.policies: Response content:
cli.azure.cli.core.sdk.policies: {"value":[{"keyName":"iothubowner", ... <REDACTED> ...}]}
cli.azext_iot.common.base_discovery: Using policy 'iothubowner' for IoT Hub interaction.
Starting event monitor, use ctrl-c to stop...
cli.azext_iot.monitor.telemetry: No Event Hub partitions found to listen on.

I see that there is a partition count but no partition id's which is weird:
image

If you provide login, we do not fetch the hub information (as you see from az iot hub show) and instead "guess" what partition id's are available (based on the partition count). This fills in this empty list and allows the command to succeed. I will add in some fail-safes to ensure that we are not fetching an empty partition id list (and throw a useful error if there are no partitions) and double check if this is expected behavior on the hub side.

Strangely enough azure-eventhub doesn't seem to have a problem to detect the partitions on that same hub. I.e. the Python code I wrote to capture events from the IoT Hub continues to work fine.

Excerpt from my test code

import logging
from azure.eventhub.aio   import EventHubConsumerClient
...
        try:
            self._client = EventHubConsumerClient.from_connection_string(
                conn_str=event_hub_connection_string,
                consumer_group="$default",
                uamqp_transport=True,
            )
        except Exception as e:
            fatal('Event Hub consumer client creation failed', e)
...
        self._logger    = logging.getLogger(__class__.__name__)
...
        for id in await self._client.get_partition_ids():
            self._logger.info(id)
            part = await self._client.get_partition_properties(id)
            self._logger.info(part)
        await self._client.close()
INFO:xxx:0
INFO:xxx:{'eventhub_name': '<REDACTED>, 'id': '0', 'beginning_sequence_number': 2052919, 'last_enqueued_sequence_number': 2062152, 'last_enqueued_offset': '3234134055352', 'is_empty': False, 'last_enqueued_time_utc': datetime.datetime(2024, 7, 11, 5, 44, 57, 249000, tzinfo=datetime.timezone.utc)}
INFO:xxx:1
INFO:xxx:{'eventhub_name': '<REDACTED>', 'id': '1', 'beginning_sequence_number': 1038378, 'last_enqueued_sequence_number': 1043256, 'last_enqueued_offset': '2147514888752', 'is_empty': False, 'last_enqueued_time_utc': datetime.datetime(2024, 7, 11, 5, 45, 4, 193000, tzinfo=datetime.timezone.utc)}
INFO:xxx:2
INFO:xxx:{'eventhub_name': '<REDACTED>', 'id': '2', 'beginning_sequence_number': 1536098, 'last_enqueued_sequence_number': 1540890, 'last_enqueued_offset': '2757397051176', 'is_empty': False, 'last_enqueued_time_utc': datetime.datetime(2024, 7, 11, 5, 44, 55, 698000, tzinfo=datetime.timezone.utc)}
INFO:xxx:3
INFO:xxx:{'eventhub_name': '<REDACTED>', 'id': '3', 'beginning_sequence_number': 754285, 'last_enqueued_sequence_number': 758972, 'last_enqueued_offset': '1713722268456', 'is_empty': False, 'last_enqueued_time_utc': datetime.datetime(2024, 7, 11, 5, 44, 49, 196000, tzinfo=datetime.timezone.utc)}

Thanks @vilit1 for providing this fix .
I just saw that you're released Version 0.25.0 of azure-iot-cli-extension last week (that included this fix). I've tested it on my machine and now it's working as expected again.
From my perspective we could close this Issue, if nobody else have any further points about it? If so, should I or you close the issue?

I can confirm that monitor-events now works for me again on the "broken" Azure IoT Hub. Unfortunately I forgot to check before upgrading the extension, so I can't be sure if it was fixed by the upgrade or something else.

I will close the issue - if this error shows up again, feel free to open a new issue.