Blog
CFOs Don't Know What AI Costs. Agents Are 24x More Expensive Than Chat. Here's Why That Matters.
9 min read
CFOs do not know what AI costs. Gartner's finance practice VP said it plainly in 2026: because AI is so new, CFOs don't really know what it costs, and cost estimates are running off by 500 to 1,000 percent. The problem is structural. Agentic AI costs are not just higher than chat AI — Goldman Sachs put agent token demand at 24 times higher than conversational LLM usage. Per-developer API keys mean costs are scattered across personal expense reports. And because there are no usage-to-output controls, nobody can tell whether the spend generated the productivity gains that justified it.
Why agentic AI costs are categorically different from chat AI
The CFO who approved an AI coding budget based on pilot costs — a few dollars per developer per month for a chat-style coding assistant — is often surprised by the first agentic bill. Agents consume tokens in an entirely different order of magnitude, because they execute multi-step workflows rather than answering single questions:
Chat LLM vs. agentic AI cost comparison:
Chat session (one question + answer):
Input tokens: ~2,000
Output tokens: ~500
Cost at claude-opus-4: ~$0.10
Agentic workflow (code review task, 5 steps):
Planning step: 8,000 tokens
Codebase read: 25,000 tokens (retrieved context)
Execution: 15,000 tokens
Verification: 10,000 tokens
Error + retry: 20,000 tokens
Total: 78,000 tokens
Cost: ~$3.90
Goldman Sachs 2026 estimate:
Agent token demand: 24x higher than conversational LLMA 24x cost multiplier applied to a team of 15 developers running agents regularly is not the kind of variance that expense reports catch in time. It is the kind of variance that shows up in quarterly reviews as a budget anomaly. This is the financial side of the problem described from the engineering perspective in runaway AI agent costs and the missing kill switch.
What the CFO currently sees (per-developer API key model)
In the default per-developer AI tool model, the CFO's visibility into AI spend looks like:
What CFO sees without managed AI runtime:
Monthly expense reports:
Developer A: "$89 Anthropic API" (March)
Developer B: "$234 OpenAI API" (March, no category)
Developer C: "$18 cursor.sh" (subscription)
Developer D: no submission — team AI card
What CFO cannot determine:
→ Which of these were agentic vs. chat
→ Which agent tasks generated ROI
→ Who is spending 10x more than average
→ Whether any workflow is running out of control
→ What the April bill will beThis is not a reporting delay problem — it is an architectural problem. Per-developer API keys route billing to individual accounts or corporate credit cards. Consolidating that into a meaningful budget line requires manual reconciliation after the fact. Forecasting the next month's bill requires guessing, because there are no usage controls that would constrain it.
Uber's lesson: spend without attribution is waste
Uber's experience in 2026 crystallized the risk. Significant AI investment, continuous deployment, and senior engineers who found no measurable correlation between token usage and consumer feature delivery:
Uber's 2026 AI budget lesson:
Claim: Significant AI investment, continuous deployment
Finding: Senior engineers found no correlation between
token usage and consumer feature delivery
Outcome: CFO questioned whether AI spend had measurable ROI
Root cause: No budget controls → no usage-to-value attribution
AI running, nobody tracking what it producedThe problem was not that the AI was not working. It was that there was no mechanism to attribute cost to output — to say "this $3,000 in tokens produced these 12 features" versus "this $2,000 in tokens went into exploratory sessions that did not ship anything." Without that attribution, AI spend looks like a cost center with no measurable return. This is the CFO question described in thirty AI agents and the CFO who wants ROI.
What financial controls in a managed AI runtime provide
The CFO's requirements for AI spend are the same as for any other infrastructure cost: predictability, visibility, and attribution. Managed AI runtime provides this as infrastructure rather than as a reporting retrofit:
Managed AI runtime financial controls:
→ Per-user monthly token budget (hard cap or alert)
→ Per-task cost estimate before execution
→ Real-time spend dashboard (CFO-accessible)
→ Usage-to-output attribution (cost per PR, per ticket)
→ Monthly forecast vs. actuals (not just actuals)
→ Budget approval workflow for high-cost agent tasksKognita's managed runtime puts the entire team's AI usage in a single dashboard — cost per user, cost per task type, real-time vs. monthly view, and per-user budget caps that prevent any individual workflow from exceeding what was approved. The CFO can forecast based on current usage patterns rather than extrapolating from last quarter's expense reports.
The budget conversation that has to happen at the right time
Most organizations have the AI budget conversation after the first large bill. The CTO approves Cursor and Claude Code for the team, developers start using agents aggressively, the bill arrives three times higher than expected, and the CFO asks for a controls plan. A controls plan retrofitted after deployment is harder to implement and causes disruption to workflows that are already running.
The CFO's budget conversation should happen before deployment — with a runtime that provides the controls as part of the initial setup, not as a later add-on.
Final take
CFOs who approved modest AI coding budgets based on chat AI costs and per-developer expense reports are frequently surprised when agentic AI is deployed at team scale. The 24x token multiplier, the uncontrolled per-developer billing, and the absence of cost-to-output attribution make AI spend genuinely hard to manage without purpose-built controls.
Financial controls for agentic AI are not optional overhead. They are the mechanism that lets a CFO approve AI spending with confidence rather than fear — because there is a number attached to it, a limit that enforces it, and a dashboard that shows whether it is generating the expected return.