Blog

A Runaway AI Agent Ran for 11 Days and Cost $47,000. Here's What Was Missing.

9 min read

A documented incident from 2025: a multi-agent LangChain system entered a retry loop, ran undetected for eleven days, and accumulated $47,000 in API charges before anyone noticed — because the billing statement is how they found out. There was no spend limit per agent, no timeout, no alert. The architecture that enabled the capability did not include any mechanism to stop it from running indefinitely. This is not an exotic edge case. It is what agentic AI looks like without a managed runtime providing the controls that every other production system takes for granted.

Why agentic workflows are expensive by design

Agentic AI consumes dramatically more tokens than conversational AI. A chat session might use a few thousand tokens. An agent planning, executing, verifying, and retrying a task consumes many multiples of that — multiplied by the number of steps and retries in the workflow:

Token consumption: chat vs. agentic workflow

Token consumption: chat LLM vs. agentic workflow

  Chat session (ask one question):
    Tokens:   ~2,000–5,000
    Cost:     cents

  Agentic coding workflow (plan → execute → verify → retry):
    Tokens per step:    5,000–20,000
    Steps per task:     3–15
    Retries on failure: 2–5x multiplier
    Total per task:     50,000–500,000 tokens
    Cost range:         $0.50–$5.00 per task

  Goldman Sachs analysis (2026):
    Agent token demand: 24x higher than conversational LLM

Goldman Sachs's 2026 analysis put the differential at 24x: agentic workflows consume roughly 24 times more tokens than conventional LLM usage. That multiplier is not a one-time cost — it applies every time an agent runs. An organization that deploys agents across ten developers is not running ten chat sessions in parallel. It is running workflows that, without controls, can compound without upper bound. This is the cost side of the problem described in AI agent cost visible, value invisible.

The per-developer API key model has no team-level controls

Most AI coding tools — Claude Code, Cursor with API models, custom agentic setups — default to a per-developer API key model. Each developer provisions their own key, manages their own billing, and is responsible for their own usage. From a team governance perspective, this is no controls at all:

Visibility gaps in the per-developer API key model

Per-developer API key model (Cursor, Claude Code default):
  Each developer:    owns their own Anthropic/OpenAI key
  Spend limit:       set per key, not per team
  Visibility:        each developer sees their own usage
  CTO sees:          nothing (different billing accounts)
  CFO reconciles:    manually, after the fact, from expense reports
  Runaway detection: whoever notices the bill spike

The CTO discovers overspend when the credit card is charged. The CFO reconciles from developer expense reports filed weeks after the cost was incurred. There is no real-time view of which agents are running, what they are spending, or whether any individual workflow has gone off the rails. The only kill switch is revoking the developer's API key — which also kills every other AI task they are running.

What a runaway agent actually looks like

The $47,000 incident is extreme but the mechanism is ordinary. An agent hits a failure condition. Its retry logic loops. Nobody is watching. The retries consume tokens. After eleven days, the bill arrives. The organization had all the right technology — the agent was doing real work before it broke — but zero runtime infrastructure to detect or stop a broken execution:

The anatomy of a runaway agent incident

LangChain multi-agent incident (reported 2025):
  Agent task:     automated code review workflow
  Duration:       11 days running undetected
  API charges:    $47,000
  Kill switch:    none
  Alerting:       none
  Discoverd by:   monthly billing statement

  Root cause:     no per-agent spend limit
                  no runtime timeout
                  no anomaly detection on token usage

With fifteen developers each running agentic workflows, the expected frequency of incidents like this is not once. It is once per quarter, in the best case. The tail risk is the developer who builds an ambitious multi-step automation and sets it running over the weekend. Runtime controls are not optional overhead for organizations deploying agents at team scale — they are table stakes.

What managed runtime controls look like in practice

The controls that prevent $47,000 incidents are not technically complex — they are organizational infrastructure that needs to exist at the team level, not per developer:

Managed runtime spend controls

Managed AI runtime spend controls:
  → Per-user token budget (weekly/monthly cap)
  → Per-task spend limit (stop before it escalates)
  → Per-workflow timeout (no 11-day runs)
  → Real-time usage dashboard (CTO sees all at once)
  → Anomaly alert (10x normal usage triggers review)
  → Kill switch (pause agent from dashboard, not by API key revocation)

Kognita provides this layer as part of the managed agent runtime. Teams connect once; the runtime enforces per-user budgets, per-task timeouts, and anomaly detection centrally — not by asking each developer to configure their own spending limits on their personal API key. The CTO sees the full team's usage in a single dashboard rather than reconciling expense reports.

The governance conversation that has to happen

Without a managed runtime, the governance conversation at most organizations goes: "we gave developers AI tools, something went wrong, someone got a big bill, we are now debating whether to restrict AI access." This is the wrong sequence. The controls should exist before the incident, not as a reaction to it.

The CIO article from early 2026 framed it as "controlled use vs. polite chaos." Polite chaos is when every developer has a personal API key, no team-level spend limits, and the only discovery mechanism is a monthly billing cycle. Controlled use is a runtime that can stop a runaway workflow in minutes, not days.

Final take

Agentic AI is more capable than chat AI and more expensive per run. The combination of autonomous execution, retry logic, multi-step workflows, and no runtime controls is a recurring source of cost incidents that tend to end in either a large bill or a blanket restriction on AI access.

A kill switch is not a nice-to-have for agentic AI. It is the difference between an eleven-day incident and an eleven-minute one — and the difference comes entirely from whether the runtime includes one.