Azure/azure-monitor-baseline-alerts

[Question/Feedback]: Log based "catch all" alerts vs Metric based alerts scoped to resource

Closed this issue · 2 comments

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Description

Hi,

Really great work in this repo, it looks interesting.
I just looked through the code for Compute alerts today, and from what I can see you are taking an approach where you are using log based alerts with a broad scope, basically having the same configuration (thresholds etc.) for all resources in the scope. Is there a reason for doing this, instead of using Metric alert rules and doing an alert rule per resource, which would allow for custom settings per resource?

I'm not saying one or the other is the right way, just curious as this is one of the challenges I'm facing in similar projects. For compute for example I'm still using log based alerts for disk monitoring, but CPU/memory is metric based and scoped to a single resource, and can then be adjusted to that resource.

Happy to test and provide feedback!

Best regards,
Jesper Bing

Thank for the feedback, We took the larger scope approach because VMs required Log collection to enable some Guest OS perf counter we wanted to alert on and we were not just replying on metrics. This allowed us to scope higher up for a set of baseline alerts. As we were using Log Alerts we also can also use the same alert across region. We used VM insight metrics because this give us a consistent data collection across all VMs again allowing us to use a single baseline alerts at a high scope level and not having to create too many alerts which would have the same thresholds. There is an example policy that scopes a VM Log alert to RG level: services\Compute\virtualMachines\Deploy-VM-HeartBeatAlertRG.json if required.

Thanks @Alboroni, for explaining. It's the same things I've been considering, and it's a good solution. Thanks again!