Blog

You Have P1 SLAs. You Have No Reliable Way to Detect P1s Automatically.

9 min read

The SLA contract is clear: P1 incidents get a thirty-minute response. The problem is that nobody is classifying P1s correctly at intake. AI classifiers trained on ticket keywords don't know which tickets represent service-wide outages versus single-user configuration issues. By the time a human escalates a ticket from P3 to P1, the thirty-minute window has already passed. The P1 SLA is technically in place and functionally meaningless.

Research from security and incident triage teams confirms this pattern: "AI triage implementations fail because they layer AI over incomplete data. The average enterprise monitors only about two-thirds of its environment. AI triage running over that partial coverage inherits every blind spot." Classification accuracy requires context the classifier doesn't have access to — and severity is the classification that has the hardest consequences when it's wrong.

Why severity classification is a codebase problem

The correct classification for an incoming ticket depends on three things that aren't in the ticket text: what service is affected, how many users are impacted, and whether the service was recently deployed. "Users cannot log in" is a P1 if the auth service is down for all users, and a P4 if one user forgot their password. The ticket text doesn't distinguish these. The codebase does.

A recent deployment to the auth service is a strong signal for P1 classification even before customer reports arrive. Service scope — is this a core customer-facing service or an internal pipeline? — determines blast radius. Current error rates tell you whether the reported behavior is isolated or systemic. All of this information lives in the codebase, in monitoring integrations, and in the deployment history — not in the ticket.

Why the same ticket text can be P1 or P4 depending on context

The same ticket text, different correct classifications:

"Users cannot log in"
  -> P1 if: auth service is down for all users
  -> P2 if: SSO is failing for one enterprise customer
  -> P3 if: one user has a misconfigured account
  -> P4 if: user forgot their password

AI keyword classification assigns the same priority to all four.
Correct classification requires knowing: what service is affected,
how many users are impacted, and whether a recent deployment changed
the auth flow. All three of those are codebase questions.

What current AI severity classifiers miss

AI severity classifiers trained on historical tickets learn that certain keywords are associated with certain severities. "Down," "outage," "all users" tend to correlate with P1. This works for tickets where the customer accurately describes scope. It fails when customers understate scope (reporting their own problem without knowing it's systemic) or when the correct classification requires knowledge they don't have.

The classifier also can't learn that a recent deployment to a customer-facing service changes the prior probability for P1. A ticket about auth failures filed two hours after an auth service deployment should be classified higher than the same ticket filed in a period of no deployments. This signal isn't in the ticket text — it's in the deployment history.

What severity classifiers have vs. what they need

What AI severity classifiers have access to:
  -> Ticket text (keyword matching)
  -> Historical ticket classifications (may not match current system)
  -> Ticket submitter's stated severity (often inflated)

What they don't have:
  -> Which services are currently degraded
  -> Whether a recent deployment touched the affected service
  -> How many users are on the affected service
  -> Whether the affected service is customer-facing or internal
  -> Current error rates from monitoring (unless explicitly integrated)

The webhook agent that classifies from system state

Kognita's webhook fires on ticket creation. The managed agent receives the ticket, identifies the service referenced, checks recent deployments to that service, checks whether the service is customer-facing, and uses CODEOWNERS to identify the on-call owner. Classification is based on actual system state — not on keyword matching against historical tickets.

How a codebase-grounded agent classifies severity correctly

How a codebase-grounded agent classifies severity:
  Ticket: "Authentication is failing — we can't log in"
  Agent queries:
    -> auth-service: last deployment 2 hours ago (!) — identifies deployment risk
    -> CODEOWNERS: @platform-team owns auth-service
    -> Checks if auth-service is customer-facing → yes, all users
  Classification: P1 — recent deployment to customer-facing auth service
  Action: pages @platform-team immediately, SLA clock starts with correct priority

When the classification is correct at intake, the P1 SLA clock starts at the right priority. The on-call team is paged immediately. The thirty-minute window contains actual response work — not the manual re-classification that should have happened at the start.

The deployment signal most triage systems ignore

The most reliable early signal for a P1 incident is a recent deployment to a customer-facing service, followed by ticket volume on that service increasing. Neither of these signals is in the ticket. Both are available from the codebase and deployment history. A webhook agent that checks deployment recency as part of classification catches incidents earlier and with higher severity accuracy than a system that only reads ticket text.

By the time a ticket is triaged, the deployment context is often cold — logs have rotated, the deploying engineer has moved on, the window for easy correlation is gone. Capturing the deployment signal at ticket creation is the only way to use it reliably.

Final take

P1 SLAs are only enforceable if P1s are classified correctly at intake. Keyword-based classifiers can't do this — the information required for correct classification isn't in the ticket text, it's in the codebase. A webhook-triggered agent that queries deployment history and service scope at ticket creation classifies with the context that determines actual impact — and the P1 SLA clock starts when it should.

A P1 SLA you can't detect is a SLA you can't meet. Correct severity detection requires system context — which means it requires reading the codebase, not reading the ticket.