Blog

95% of Enterprise AI Pilots Fail. The Reason Is Product Context, Not Model Quality.

9 min read

Ninety-five percent of enterprise gen-AI pilots fail to deliver measurable P&L impact. That is not a number from an AI skeptic. That is from an MIT study examining enterprise adoption across industries. The finding is specific: the failures are not caused by model capability. The models are good. The failures come from integration gaps, data gaps, and governance gaps — the gap between what the AI can theoretically do and what it can actually access and act on in the enterprise context it was deployed into.

The McKinsey data is directionally consistent: 78% of companies use AI, and 80% report no material impact on earnings. Forty-two percent of companies scrapped most of their AI initiatives in 2025, up from 17% in 2024. This is not a pilot failure rate that improves with better prompting or a newer model. This is a structural gap that persists regardless of model generation, because the constraint was never model capability.

For engineering organizations specifically, the missing ingredient has a precise name: product context. Agents do not know what to build, what already exists, or why the business wants what it is asking for. Faster code generation without that context does not accelerate value delivery. It accelerates waste.

Why the failure is structural, not technical

Many companies depend on general-purpose models like copilots or chatbots. While these tools help employees work faster, their benefits are often spread too thin to impact revenue or costs directly. This pattern describes what happens when you deploy a capable model without giving it the context it needs to produce outcomes that matter.

A general-purpose copilot makes individual engineers faster at writing code. It autocompletes, it suggests, it can generate a working implementation of a described behavior in seconds. The problem is that "writing code faster" is not the bottleneck. The bottleneck is knowing what code to write — and that requires knowing what the business is trying to accomplish, what the system currently does, and what already exists that either overlaps or conflicts with the proposed work. The model has none of this by default.

Why pilots fail — and what the data actually shows

Why 95% of enterprise AI pilots fail to deliver P&L impact:

  What the MIT study found:
  -> Integration gaps    — the AI cannot reach the systems it needs
  -> Data gaps           — the AI does not have access to the right context
  -> Governance gaps     — the AI cannot act on what it finds
  -> NOT model quality   — the models are capable

  For engineering teams specifically:
  -> Agent knows how to write code
  -> Agent does not know what the company is trying to build
  -> Agent does not know what already exists in the codebase
  -> Agent does not know what the business prioritizes this quarter
  -> Agent does not know what customers complained about last week

  Result: faster code generation, no change in delivery outcomes
  The 78% of companies reporting no material earnings impact — this is why.

The integration gap is the context gap. When an AI agent cannot access the systems it needs — the codebase, the ticket backlog, the deployment state, the customer feedback — it is operating on a narrow slice of the picture. It can do sophisticated reasoning over that narrow slice. The reasoning is sound. The input is incomplete. And the output, however technically correct, addresses the wrong problem or duplicates work that already exists or contradicts a decision the business made six weeks ago for reasons that were never written down in a file the agent can read.

The product management bottleneck nobody is talking about

There is a structural shift underway in engineering organizations that most AI pilot post-mortems miss. Product management overtook engineering as the limiting factor in value delivery. This is the direct consequence of AI accelerating code production: the limiting factor is no longer how fast engineers can implement, it is how clearly the business can specify what to build and verify that what was built is what was needed.

How the bottleneck shifted — and what the new constraint requires

The bottleneck shift — engineering to product:

  2022: Engineering capacity was the constraint
    -> Teams had backlogs measured in months
    -> More engineers = more output
    -> AI tools accelerated individual code production

  2026: Product understanding is the constraint
    -> Engineering capacity is no longer scarce
    -> AI agents can generate implementation quickly
    -> The limiting factor: knowing what to build
    -> Product management overtook engineering as the limiting factor
       in value delivery

  What "knowing what to build" requires:
    -> What does the codebase currently do?
    -> What is already built that overlaps with the proposed work?
    -> What does the Jira backlog say the business needs?
    -> What are the constraints the system imposes on what is feasible?
    -> What did customers actually ask for vs. what got written into tickets?

  None of this lives in the model.
  All of it has to be provided as context.

CTOs and VPs of Engineering who deployed AI coding tools expecting a velocity multiplier often saw velocity metrics increase while delivery outcomes stayed flat. The engineers were generating more code. The code was addressing the wrong things, or duplicating existing functionality, or building toward a product vision that had been updated in a meeting nobody captured in a form the agent could read. The bottleneck moved. The tool did not move with it.

For product managers and non-technical CEOs and CFOs trying to understand why the AI investment has not produced ROI: this is the explanation. The investment accelerated implementation. It did not accelerate the upstream problem of knowing what to implement. Those are different problems, and they require different solutions.

What happens when an agent builds without knowing what exists

The most consistent failure mode in engineering AI pilots is not the agent producing bad code — it is the agent producing good code for the wrong problem. An engineer asks for a feature. The agent generates a clean implementation. The implementation is technically correct. It also duplicates a service that already exists, uses a different data schema than the rest of the system, and misses a consent requirement that the legal team embedded in the product spec three weeks ago.

The same engineer request — without and with product context

What a general-purpose copilot produces without product context:

  Engineer asks: "Build a notification preferences page."

  Copilot generates:
  -> A complete React component with toggle switches
  -> LocalStorage persistence
  -> A new /api/notifications/preferences endpoint
  -> Basic on/off states for email, push, and in-app

  What the copilot did not know:
  -> NotificationPreferencesService already exists with 14 preference types
  -> The existing API uses a different schema (preference_key / value pairs)
  -> Marketing requires opt-in tracking through a consent service
  -> The design system has a ToggleGroup component for this exact pattern
  -> This feature was scoped differently in FEAT-204, currently in progress

  The copilot was faster.
  The output needs to be thrown away.
  The engineer is now further behind than if they had read the codebase first.

This is not a model failure. The copilot did what it was asked. The problem is that the agent does not know your product — what it does, what it has already built, what constraints it operates under, what the business is trying to accomplish in this quarter. Without that context, the agent is making reasonable guesses about a system it has never read. Some of those guesses will be right. Enough of them will be wrong to make the output unreliable as a foundation for delivery.

The forty-two percent of companies that scrapped AI initiatives in 2025 were not scrapping them because the models stopped working. They were scrapping them because the outputs were not connecting to business outcomes — because faster code generation without product context produces faster waste, and at some point the ROI calculation stops making sense.

Why context grounding is the actual fix

The MIT study's framing is useful: the failures are integration gaps, data gaps, and governance gaps. For engineering teams, closing the integration and data gaps means giving the agent access to the two things it needs that it currently does not have: the codebase and the business intent.

Codebase context tells the agent what already exists, what the system looks like, and what it can and cannot do. Business intent — primarily in the Jira backlog — tells the agent what the company is trying to accomplish, what is in scope, and what priorities have been set. Neither of these is in the model by default. Neither can be approximated by better prompting. Both have to be provided as grounded, current, specific input. Context grounding is what prevents agent hallucination — not just about facts, but about what the system does and what the business needs.

The distinction matters for CFOs and CEOs evaluating AI ROI: the question to ask is not "are our agents better?" It is "do our agents know what we are trying to accomplish and what already exists?" The first question has been answered — the models are capable. The second question is where the 95% failure rate lives.

The pilot that becomes a business outcome

The 5% of enterprise AI pilots that do deliver measurable P&L impact share a common characteristic: the agents are operating with access to the context they need. They know the system. They know the business priorities. They can reason over both simultaneously and produce outputs that are relevant to the specific problem the business is trying to solve — not generic best-practice recommendations that could apply to any company in any industry.

Same model, same task — the difference context makes

The difference context makes — same agent, same model, different input:

  Task: "Improve the checkout conversion rate."

  Without product context:
    Agent recommendation: Add progress indicators, reduce form fields,
    add social proof elements, optimize CTA button placement.
    Standard e-commerce conversion optimization advice.
    Could be for any product. Probably already done. Cannot know.

  With Kognita context (codebase + Jira):
    Agent recommendation: The checkout flow currently shows three
    separate loading states as it calls OrderService, PricingEngine,
    and InventoryService sequentially. Parallelizing these calls
    eliminates ~1.4s of wait time. FEAT-189 in the backlog already
    scoped this — it was deprioritized in Q3. The mobile checkout
    form has six required fields; StripeElements could reduce this
    to two visible fields using the existing Stripe integration.
    No new infrastructure required.

  First answer: generic advice that requires weeks to evaluate.
  Second answer: two specific, implementable changes with no unknowns.
  Same model. Different context.

The second answer is what a business outcome looks like. It is specific. It is grounded in what actually exists. It identifies work that has already been scoped and can be reprioritized rather than rebuilt. It produces a list of implementable changes with no discovery risk — because the discovery was done by the context layer before the agent generated the recommendation.

A managed AI agent runtime provides this for the whole team, not just for individual engineers who know how to configure their local tooling. When the context layer is shared — codebase indexed, Jira connected, queryable by product managers, engineering managers, and agents alike — the pilot stops being a developer productivity tool and starts being a delivery intelligence layer. That is when the ROI calculation changes.

How Kognita closes the context gap

Kognita indexes codebase state and connects it to Jira intent, making both queryable in plain language by anyone on the team. The agent — whether it is Claude Code, a custom workflow, or a Jira-native AI — gets both: what the system currently does and what the business is currently trying to accomplish. That combination is what moves the agent from "faster code generation" to "accurate value delivery."

For CTOs evaluating whether to continue or expand an AI pilot: the question is not whether to add a better model. The question is whether the current agents have access to the context they need. If engineers are using Claude Code or Cursor without a shared codebase index, every agent session starts from scratch. There is no organizational memory, no shared understanding of what exists, no connection to the product intent that lives in Jira. The agents are capable. They are context-blind. And context blindness is the 95%.

When a CFO asks for ROI on thirty AI agents, the answer requires showing that the agents are producing outputs grounded in what the business actually needs — not just outputs that are technically correct in isolation. Kognita provides the grounding layer: codebase truth and Jira intent, indexed and connected, so the agents have what they need to produce work that moves a number instead of just moving code.

Final take

The 95% failure rate is not going to improve by upgrading the model. It is going to improve when enterprises close the context gap — when the agents they deploy have access to what the organization has already built and what it is trying to accomplish. For engineering organizations, that means codebase context and product context working together as a unified input layer.

The companies in the 5% are not using better AI. They are using AI that knows what it is working with. The fix is not a better model. It is giving the model the right context — codebase truth plus Jira intent, together, queryable, current. That is what moves the AI pilot from interesting demo to measurable P&L impact.