Azure/azure-monitor-baseline-alerts

[Question]: VM Log Alerts not compliant

Closed this issue · 8 comments

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Description

We deployed some of the Compute alerts. Both Mertic Alerts and Log Alerts.
All of the alert rules are deployed fine via a remediation task and the policies of the Metric Alerts are all compliant.

Allthough the ones for the Log Aerts remains Non-compliant. The Log Alerts are scoped on subscription level, so I find it strange that when I click on the policy compliancy, I receive list of vm resources:
image
When clicking on details I see the error that the name of the alert rule is not correct (for the example for the high cpu alert):
image

What am I missing or doing wrong?
I scoped the policy on a management group.

Hello @wardwygaerts ,
thanks for your feedback. Could you please send the alert name which is not compliant and check if any of the remediation failed?

Thanks,
Bruno.

all Log Alerts are not compliant:
Identity StoppedServicesAlert
VMHighCPUAlert
VMHeartbeatAlert
VMLowdataDiskSpaceAlert
VMLowMemoryAlert

The alerts rules, which are created, are on the subscription scope, so this looks fine, but the policies stays non compliant.

I have 4 Vm's in this environment and the deployment of only 1 is successful.
The error for the other ones:

Unable to edit or replace deployment 'VMCPUAlert': previous deployment from '4/30/2024 2:35:48 PM' is still active (expiration time is '5/7/2024 2:35:47 PM'). Please see https://aka.ms/arm-deploy-resources for usage details. (Code: DeploymentActive)

But if the alert rule is created on subscription level scope, why is the evaluation done on VirtualMachine level?

The message is indicating a transient throttling error which can be easily resolved by running the remediation one more time. This should return the compliance for the resources which are part of the alert. It is true that the alert is deployed at the subscription level but the target is the virtualMachines resource type. in fact you should see the VMs listed as resources.

Thanks!

Thanks for you answer.
I already ran the remediation several times. The deployment only works for 1 resource and fails for the other ones. I have the same behavior in other environments.
Which I think is kind of normal, because every resource triggers thee same deployment: create an alert rule for that subscription with the same parameters.

image

Can you please try to remove all the AMBA deployments using the script patterns/alz/scripts/Remove-AMBADeployments.ps1 and run the remediation again? at least you should see different resources becoming compliant.

Thanks!

Hello @wardwygaerts ,
where you able to remove the previous deployments and try again?

Thanks!

Hello @wardwygaerts ,
is there any news about this one?

Thanks!

Closing this one since no news since 2 weeks ago