Automated Azure Alert Triage

Azure Monitor can generate hundreds of alerts per day in a busy environment — many of them repetitive, low-priority, or duplicates of each other. Feeding those alerts into Claude via a Logic App workflow lets you automatically triage them, group related issues, and generate human-readable remediation suggestions before an engineer even looks at their phone.

Architecture Overview

The flow is straightforward: Azure Monitor fires an alert → Action Group calls a Logic App HTTP trigger → Logic App sends the alert payload to Claude via the Anthropic API → Claude returns a structured triage assessment → Logic App creates an enriched ticket in your ITSM and optionally sends a Teams message.

Setting Up the Logic App

Create a Consumption-tier Logic App with an HTTP trigger. Store your Anthropic API key in an Azure Key Vault and reference it via a managed identity so it never appears in the workflow definition.

# Create the Logic App and Key Vault secret via PowerShell
$rg = "rg-aiops"
$kvName = "kv-aiops"

New-AzLogicApp -ResourceGroupName $rg -Name "la-alert-triage" -Location "westeurope"

$secret = ConvertTo-SecureString $env:ANTHROPIC_API_KEY -AsPlainText -Force
Set-AzKeyVaultSecret -VaultName $kvName -Name "anthropic-key" -SecretValue $secret

The Triage Prompt

The quality of triage depends entirely on the system prompt. Give Claude the context it needs to make useful decisions:

system_prompt = """
You are an AIOps triage assistant for a Microsoft Azure environment.
When given an Azure Monitor alert, respond with JSON containing:
  severity: critical|high|medium|low
  likely_cause: one sentence explanation
  immediate_action: what to check first
  runbook: the most relevant runbook name from our library
  auto_resolvable: true if this commonly self-resolves within 10 minutes

Environment context:
- Production workloads run in westeurope and northeurope
- Business hours are 07:00-18:00 CET
- Critical = page on-call immediately; High = notify within 15 min
- Our runbook library: [DiskSpace-Cleanup, IIS-Restart, SQL-Failover, VM-Reboot]
"""

alert_message = f"""Alert Name: {alert["alertName"]}
Resource: {alert["resourceId"]}
Condition: {alert["condition"]["allOf"][0]["metricName"]} {alert["condition"]["allOf"][0]["operator"]} {alert["condition"]["allOf"][0]["threshold"]}
Fired At: {alert["firedDateTime"]}
Description: {alert.get("description", "N/A")}"""

Connecting to Your ITSM

Once Claude returns the structured JSON, the Logic App uses a switch action to route based on severity: Critical triggers a PagerDuty page, High creates a ServiceNow P2 incident with the triage notes pre-filled, and Medium/Low creates a ticket silently for morning review.

# Example: Parse Claude response and create ServiceNow incident
$triage = $claudeResponse | ConvertFrom-Json

if ($triage.severity -in @("critical","high")) {
    $incident = @{
        short_description = "$($alert.alertName) - $($triage.likely_cause)"
        description       = "Immediate action: $($triage.immediate_action)\nRunbook: $($triage.runbook)"
        urgency           = if ($triage.severity -eq "critical") { 1 } else { 2 }
        category          = "infrastructure"
    }
    Invoke-RestMethod -Uri $snowUrl -Method Post -Body ($incident|ConvertTo-Json) -Headers $snowHeaders
}

What This Saves in Practice

In a typical 200-VM environment this pattern reduces the number of alerts that require immediate human attention by 40-60%. The low-value noise gets silently ticketed; on-call engineers only get paged for events that genuinely need them. That is meaningful quality-of-life for whoever is carrying the pager on a Sunday night.

Summary

AI-powered alert triage is one of the highest-ROI applications of LLMs in IT operations. It requires almost no infrastructure change — just a Logic App between your existing alerting and ticketing systems — and starts delivering value the day you switch it on.

Automated Azure Alert Triage

Architecture Overview

Setting Up the Logic App

The Triage Prompt

Connecting to Your ITSM

What This Saves in Practice

Summary

Submit a Comment Cancel reply

Search

Share this!

Articles

Topics