Azure Monitor: How To Stop Log-Based Alerts for Specific Resources

Bruno Gabrielli · Jul 15, 2024

Hello howdie readers

How many times, while dealing with alerting configured at scale, you had the need of stopping the alerts for few resources or even for only one?

Creating alerts that work at scale, meaning alerts created with a wide scope (all virtual machine in a given subscription) gives you a hug advantage on alert management, but it also comes with some cons. For instance, you cannot disable the alert for a specific resource. Reason? The alert user interface does not include a feature or setting to disable an alert based on a specific resource (or resources); it only

allows you to enable or disable it.

So, given the need, how can you make sure to not get alerts for resources which are in maintenance or that you do not want to monitor?

Thanks to the ability to Correlate data in Azure Data Explorer and Azure Resource Graph with data in a Log Analytics workspace, you can now create log-search based alerts that leverage this capability. If you are not familiar with Azure Resource Graph (ARG), as a brief description we can say that is an Azure service designed to extend Azure Resource Management by providing efficient and performant resource exploration. Resource Graph can query at scale across a given set of subscriptions so that you can effectively govern your environment.

Among the fields and properties that you can retrieve by querying ARG, there are also the tags defined for the resources. Tags are exactly the cornerstone of this post and the key to identifying the resource for which you would like to stop alerting.

The new correlation capability works in our favor since, in the same alert query, you can define a first step of identifying the resource based on Tag names and Tag values to create a resource exclusion list. Then, you can compare the results of the alert query with the list of excluded resources to exclude them from the result set to not return any records for them.

Easy enough, isn’t it

?

Let us see how it works in real life. As anticipated, you need to:

Identify the resources for which we do not want to get the alert. For instance, we would like to stop getting alerted about a virtual machine called vm-Demo01
Define a tag name and a tag value to be applied on. We could define StopMonitoring as tag name and True as tag value so the resource tagging will look similar the one in the following picture:

Retrieve the list of resources with the tagging defined on step #2 For this step do not forget that you need have a Managed Identity with the necessary permissions assigned at the relevant scope. It is normally enough to create a User Assigned Managed Identity that can be used in all alerts that need to read from ARG and assign it the Monitoring Reader permission at the subscription level

Exclude that list from the alert query

Defining the tagging is not that difficult, hence I am not going to describe it.

Creating a list of excluded resources is easy as well but could require some time to correctly identify the ARG query to be used. Moreover, consider that you could exclude resources based on existing tag names which might have different values. As an example, imagine that you tagged your dev/test resources with the Environment tag name that, according to the purpose, can have either Dev or Test. You can exclude resources in both using the same query.

The query part, which will be stored with an alias using the let statement, for a single tag name and tag value will look like:

Code:

let excludedResources = (arg("").resources
| where type =~ "Microsoft.Compute/virtualMachines"
| project _ResourceId = id, tags
| where parse_json(tostring(tags.StopMonitoring)) =~ "true"
);

The one for a single tag name with multiple values, will look like:

Code:

let excludedResources = (arg("").resources
| where type =~ "Microsoft.Compute/virtualMachines"
| project _ResourceId = id, tags
| where parse_json(tostring(tags.Environment)) in~ ("Test", "Dev", "Sandbox")
);

Running the single tag name query would give back the records for the resources which have been tagged accordingly:

Right after the let statement, you put your alert query (or change the existing ones where necessary/applicable) to dynamically stop alerting based on the provided tag configuration:

Code:

InsightsMetrics
| where _ResourceId has "Microsoft.Compute/virtualMachines"
| where _ResourceId !in~ (excludedResources) //This is where we exclude resources identified by the tagging
| where Origin == "vm.azm.ms"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId

If you run the previous query the filter line commented, it will return all the resources which satisfy the condition; it will exclude none:

Assembling the two parts together will give you the final alert query:

Code:

let excludedResources = (arg("").resources
| where type =~ "Microsoft.Compute/virtualMachines"
| project _ResourceId = id, tags
| where parse_json(tostring(tags.StopMonitoring)) =~ "true"
);
InsightsMetrics
| where _ResourceId has "Microsoft.Compute/virtualMachines"
| where _ResourceId !in~ (excludedResources) //This is where we exclude resources identified by the tagging
| where Origin == "vm.azm.ms"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId

And if you run it now, you will not get vm-Demo01 anymore because of the filter (line #8)

I am now sure you can continue with the alert rule creation without my help

.

Something that I have not mentioned yet is that using this combined (ARG and Log Analytics) query approach works near real-time. Once the alert is there, you only need to add or remove the tagging to/from the given resource(s) and …

… That’s all folks, thanks for reading through

Disclaimer

The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without a warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.

Continue reading...

Azure Monitor: How To Stop Log-Based Alerts for Specific Resources

Bruno Gabrielli