/azure-scale-vmss-on-events

Scale Azure VM Scale Sets in response to events. For ex, based on queue length

Primary LanguageBicep

Effortlessly Scale Your Azure VM Scale Sets in Response to Events

At Mystique Unicorn, the developers have adopted the event-driven architectural pattern for their application, which enables them to process streaming data. Specifically, their physical stores will be sending a continuous stream of sales and inventory related events to a central location. Multiple downstream systems will then consume these events. To ensure scalability, the team intends to utilize Azure VM Scale Sets to expand their event consumers whenever the Azure Storage queue length increases.

Given their interest in leveraging Azure's event-driven capabilities, the developers are seeking guidance on how to begin implementing these features.

๐ŸŽฏ Solutions

To take advantage of Azure's event-driven capabilities and achieve autoscaling, we can configure Azure VM Scale Sets (VMSS) to dynamically increase or decrease the number of VMs based on specific events. In this case, we can scale the VMSS based on the length of the Storage Queue Level. It's worth noting that, at the moment (as of Q2-2023), Azure only supports granularity at the Storage Account level, rather than at the individual Storage Queue level. As a result, to make this approach work, we would need to limit the storage account to a single queue. Assuming that this limitation is acceptable, we can proceed with this strategy.

Miztiik Automaton: Auto Scaling Azure VM(ScaleSets) in response to events

While building this solution, I have found anamolies that created challenges in implementing this solution. I have documented them in the Anamolies section.

During the process of building this solution, I have encountered anomalies that have posed challenges to the implementation. These anomalies have been documented in the Anamolies section.

๐Ÿ“ Anamolies

Before we proceed to examine the anomalies, it's important to note that there are currently a few messages present in the queue. Additionally, a producer is continuously adding messages to the queue over a period of time. As a result, any query made to determine the queue length should return a non-zero value.

Miztiik Automaton: Auto Scaling Azure VM(ScaleSets) in response to events

  • Azure Monitor Portal - My own testing shows that the metric is not updated even after every few hours Miztiik Automaton: Auto Scaling Azure VM(ScaleSets) in response to events

  • Azure Monitor Storage Account Metrics - The granularity of Storage Queue Metrics is at storage account level and not at individual resource level 3,4,5 as a metric.

  • Azure Monitor Metric Update Frequency - QueueMessageCount is platform metric. The update frequency is supposedly 1 minute6.

    Excerpt from the docs

    This article is a complete list of all platform (that is, automatically collected) metrics currently available with the consolidated metric pipeline in Azure Monitor.

    Excerpt from the docs

    Platform metrics are collected from Azure resources at one-minute frequency unless specified otherwise in the metric's definition.

    The best practices for monitoring Azure Queue Storage informs that this is metric is refreshed daily.

    Excerpt from the docs

    You can monitor the message count for all queues in a storage account by using the QueueMessageCount metric. This metric is refreshed daily.

  • Azure SDK to Query Queue Metrics from Azure Monitor - Querying the Queue Metrics directly using the SDK also does not reflect the facts. I used the python8,9 SDK and wrote a script to query the metrics. The metric is not updated even after every few hours.

    # export PYTHONWARNINGS="ignore:Unverified HTTPS request"
    # pip install azure-mgmt-monitor==6.0.0
    # pip install azure-monitor-query=1.1.1
    import os
    from datetime import timedelta, datetime
    from azure.identity import DefaultAzureCredential
    from azure.mgmt.monitor import MonitorManagementClient
    from azure.monitor.query import LogsQueryClient, MetricsQueryClient, MetricsQueryClient, MetricAggregationType
    
    
    subscription_id = "1axxxx9e3"
    resource_group = "Miztiik_Enterprises_scale_vmss_on_events_011"
    storage_account = "warehouseghv6kv011"
    queue_service = "store-events-q-011"
    
    metrics_uri = (
        f"subscriptions/{subscription_id}"
        f"/resourceGroups/{resource_group}"
        f"/providers/Microsoft.Storage"
        f"/storageAccounts/{storage_account}"
        f"/queueServices/{queue_service}"
    )
    
    credential = DefaultAzureCredential()
    
    query_client = MetricsQueryClient(credential)
    start_time = datetime(2023, 4, 19)
    duration = timedelta(days=2)
    
    
    response = query_client.query_resource(
        metrics_uri,
        metric_names=["QueueMessageCount"],
        timespan=(start_time, duration)
        )
    
    print(f"QueueMessageCount: {response.metrics._metric_names['QueueMessageCount']} TimePeriod: {response.timespan} Granularity: {response.granularity} ")

    Output

    QueueMessageCount: 0 TimePeriod: 2023-04-19T00:00:00Z/2023-04-21T00:00:00Z Granularity: 1:00:00
  • Azure SDK to Query Queue Metrics from Queue Properties - I have tested the queue length update frequency by querying the queue message count from queue properties using Python SDK. This method seems to deliver much better results.

    import os
    import time
    from azure.identity import DefaultAzureCredential
    from azure.storage.queue import QueueServiceClient
    
    Q_NAME="store-events-q-011"
    SA_NAME = os.getenv("SA_NAME", "warehouseghv6kv011")
    Q_SVC_ACCOUNT_URL=f"https://{SA_NAME}.queue.core.windows.net"
    
    credential = DefaultAzureCredential(logging_enable=False)
    q_svc_client = QueueServiceClient(Q_SVC_ACCOUNT_URL, credential=credential)
    
    q_client = q_svc_client.get_queue_client(Q_NAME)
    
    for num in range(1, 11):
        properties = q_client.get_queue_properties()
        count = properties.approximate_message_count
        print("Current Message count: " + str(count))
        time.sleep(10)

    Output: As we can observe the # of messages keep increasing every 10 seconds. That means the properties is clearly updated much more frequently. As a future exercise, i will try to verify it with actual count of messages in queue.

    Current Message count: 2973
    Current Message count: 2977
    Current Message count: 2981
    Current Message count: 2985
    Current Message count: 2989
    Current Message count: 2993
    Current Message count: 2997
    Current Message count: 3001
    Current Message count: 3005
    Current Message count: 3009
  • Azure Portal - Strange Place to look for consistencies - If you navigate to the Azure Storage Queue Resource Portal Page, Surprisingly you will find the queue length being updated in the properties field.

    Miztiik Automaton: Auto Scaling Azure VM(ScaleSets) in response to events

  • Azure Portal - Diagnostics for Storage Queue - If you enable diagnostic logging for your Stroage Account - Queue, it does show the transctions of message ingests.

    Miztiik Automaton: Auto Scaling Azure VM(ScaleSets) in response to events

๐Ÿ“’ Conclusion

Based on current information and research, it appears that the QueueMessageCount metric in Azure is not consistently updated, and its frequency cannot be relied upon. This issue has been highlighted by the community on StackOverflow. As a result, the best approach to monitor the queue length is to use the Azure SDK to directly query the queue. One potential solution could be to use Azure Functions to query the queue length and dynamically scale the VMSS accordingly.

In addition, it's worth considering the specific requirements of your application when choosing between Azure Storage Queue and Azure Service Bus. If you need reliable, granular, and frequently updated metrics for your queue, then Azure Service Bus may be the more suitable option.

๐Ÿงน CleanUp

If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well

  • Resources created during this demo
  • Any other custom resources, you have created for this demo
# Delete from resource group
az group delete --name Miztiik_Enterprises_xxx --yes
# Follow any on-screen prompt

This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.

๐Ÿ“Œ Who is using this

This repository aims to show how to use Azure with Iac(Bicep) to new developers, Solution Architects & Ops Engineers in Azure.

๐Ÿ’ก Help/Suggestions or ๐Ÿ› Bugs

Thank you for your interest in contributing to our project. Whether it is a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here

๐Ÿ‘‹ Buy me a coffee

ko-fi Buy me a coffee โ˜•.

๐Ÿ“š References

  1. Azure Docs: Autoscale for VMSS
  2. Azure Docs: Custom Autoscale for VMSS with Resource Metrics
  3. Azure Docs: Monitoring Azure Queue Storage
  4. Azure Docs: Queue Storage monitoring data reference
  5. Azure Docs: Supported metrics with Azure Monitor - Queue Storage
  6. Azure Docs: Supported metrics with Azure Monitor - Queue Storage Data Collection
  7. Azure Docs: Best practices for monitoring Azure Queue Storage
  8. Azure Docs: Azure Monitoring libraries for python
  9. Azure Docs: Azure Monitor Query client library for Python
  10. Azure Docs: Azure Monitor Query client library for Python
  11. StackOverflow: How frequently are the Azure Storage Queue metrics updated
  12. Azure Docs: Configure pythong logging in the Azure libraries

๐Ÿท๏ธ Metadata

miztiik-success-green

Level: 300