Blog

The SLA Clock Starts at Ticket Creation. The Triage Step Eats Most of the Window.

9 min read

The SLA policy says four hours to resolution. The ticket arrives at 9am. By 1pm, when the SLA breaches, no engineer has touched the problem — the entire four hours were consumed by a ticket bouncing between three teams before reaching the one that owns the affected service. The resolution itself took fifteen minutes. The SLA breach had nothing to do with workload, complexity, or engineer availability. It was pure routing latency.

This is the most common SLA failure pattern that reporting systems misattribute. The breach gets logged against the engineering team that ultimately resolved it. The actual cause — the four hours spent on incorrect routing — is invisible in the metric. You fix the wrong problem (engineer capacity) when the real problem is triage architecture.

The anatomy of a routing-caused SLA breach

Most SLA dashboards show time-to-resolution. They don't show time-to-correct-routing. Those are different numbers, and the gap between them is where SLAs die. A ticket that routes instantly to the right team has all its SLA time available for resolution work. A ticket that bounces through two wrong teams before reaching the right one has used most of its SLA window before a qualified engineer sees it.

Atlassian's own community has documented this extensively: "The breach wasn't caused by workload but by a handoff gap." Escalation rules configured at implementation that were never updated after team restructuring. Tickets routing to shared inboxes nobody monitors. The SLA architecture assumes instant correct routing. The actual routing is multi-step and lossy.

How a 4-hour SLA window disappears before work begins

Where a 4-hour SLA window actually goes:
  9:00am — Ticket created. SLA clock starts.
  9:02am — Auto-reply sent. First response SLA: met.
  9:02am — AI triage routes to Payments team (keyword: billing).
  11:00am — Payments team reviews, decides it belongs to Data Pipeline.
  11:15am — Re-routed to Data Pipeline.
  1:00pm — Data Pipeline: "Check CODEOWNERS — this is Reports team."
  1:15pm — Re-routed to Reports team.
  1:20pm — SLA BREACHED. Reports team hasn't even seen it yet.
  1:30pm — Reports team engineer picks up ticket.

  Time spent on actual work: 0 minutes.
  Time spent on routing: 4+ hours.
  Resolution time once correct team has it: 15 minutes.

The triage steps SLA designers don't account for

When an SLA is designed, the assumption is that after first response, the ticket is in the hands of the right person. In practice, there are multiple steps between "ticket acknowledged" and "right engineer sees it with context." Each step has a queue. Each queue has latency. The triage chain eats the SLA window step by step, and by the time the correct engineer opens the ticket, the clock has often already expired.

The hidden triage steps that consume the SLA window

The triage steps nobody accounts for in SLA design:
  Step 1: L1 support reads ticket, decides it's technical (+30 min)
  Step 2: Routes to engineering support queue (+15 min wait)
  Step 3: Engineering support tries to identify which team (+45 min)
  Step 4: Routes to guessed team, team reviews and rejects (+2 hours)
  Step 5: Re-routes to correct team (+30 min)
  Step 6: Correct team picks up the ticket

  Total triage time: 4+ hours on a 4-hour SLA
  The "resolution time" that gets reported starts at Step 6.
  The SLA breach happened at Step 4.

First response SLA is met while resolution SLA fails precisely because the auto-reply is instant and the routing chain is slow. The two metrics measure different parts of the process, and optimizing for first response does nothing to fix the routing problem that causes resolution breaches.

What instant correct routing does to SLA compliance

When a ticket routes correctly on creation — before any human touches it — the entire SLA window is available for resolution work. There's no L1-to-L2 queue. No "is this Payments or Data Pipeline?" decision. No re-routing. The engineer who owns the service sees the ticket within seconds of creation, with the relevant service context already attached.

What the timeline looks like with instant correct routing

What changes when routing is instant:
  9:00am — Ticket created. SLA clock starts.
  9:00am — Kognita webhook fires. Agent queries codebase for service ownership.
  9:00am — Routes directly to Reports team Jira queue with service context attached.
  9:05am — Reports engineer sees ticket with context already populated.
  9:20am — Issue resolved.

  Time to correct owner: seconds.
  Time to resolution: 20 minutes.
  SLA: met with 3 hours and 40 minutes to spare.

Kognita provisions a webhook that fires on ticket creation. The managed agent identifies the affected service from the ticket description, queries CODEOWNERS for current ownership, and routes directly to the correct Jira queue — attaching service context to the ticket before any human is involved. The routing step that used to take hours takes seconds.

The reporting fix that reveals the real problem

Teams that want to understand their actual SLA failure modes need to measure time-to-correct-routing separately from time-to-resolution. If most SLA breaches occur before the correct team sees the ticket, the problem is routing. If breaches occur after correct assignment, the problem is capacity or complexity. These require different fixes.

In most teams that use AI keyword triage, routing-caused breaches are the majority of SLA failures — they're just misattributed to engineering slowness because that's where the breach appears in reporting. Fixing this requires routing that knows service ownership, not routing that pattern-matches on ticket text.

Final take

SLA clocks start at ticket creation. The triage step that precedes actual resolution work consumes a large fraction of the SLA window. The fix is not faster engineers — it's eliminating the routing latency entirely. A webhook-triggered agent that routes by codebase ownership makes the triage step instantaneous and puts the SLA window where it belongs: on resolution.

SLA breaches that happen before an engineer touches the ticket are routing failures, not engineering failures. Routing by codebase ownership eliminates the routing step as an SLA risk entirely.