Blog

The AI Failure Mode That's Worse Than Hallucination: Remembering Wrong

11 min read

You are using Cursor to implement a new feature. You ask it to integrate with the billing module, and the model confidently suggests calling processPayment() in billing-service/payment.ts. The function name looks right — specific, plausible, exactly the kind of name your team would use. Except that function was renamed to chargeCustomer() six weeks ago when the billing module was refactored. The model does not know. It cannot know — its context was indexed before the rename. You catch it this time because you happened to remember the refactor. The next developer on this codebase might not.

That scenario is not a hallucination in the way most people mean the word. The model did not invent a fictional function. It remembered a real function — accurately, confidently, with all the right context — that no longer exists. The failure mode is not forgetting. It is remembering wrong.

Two types of AI wrong answers

The AI wrong-answer problem has two distinct categories, and they behave very differently in practice. Hallucinations — where the model invents something that never existed — get most of the attention. Stale facts — where the model remembers something that used to be true — are more dangerous and far less discussed.

Hallucinations often fail loudly. The model suggests emailService.sendVerificationBlast(user) and TypeScript tells you within seconds that no such method exists. The failure is immediate, visible, and easy to trace. You correct it and move on.

Stale facts fail quietly. The model suggests billingService.processPayment(order) and the code compiles. It type-checks clean. In staging, if the old function name is still exported as an alias somewhere in the module — which happens during gradual refactors — it may even run. The failure reaches production. It surfaces in an edge case, in a code path nobody tested thoroughly, on a Tuesday at 2am when nobody is watching the dashboards closely.

Hallucination vs. stale fact — how quickly each gets caught

Two categories of AI wrong answers — and how quickly each gets caught:

  Hallucination (making things up):
  -> AI suggests: emailService.sendVerificationBlast(user)
  -> Reality: method never existed
  -> How it fails: compile error, TypeScript complains immediately
  -> Time to catch: seconds to minutes
  -> Risk level: low — loud failure

  Stale fact (remembering what used to be true):
  -> AI suggests: billingService.processPayment(order)
  -> Reality: renamed to chargeCustomer() in billing-service refactor six weeks ago
  -> How it fails: compiles fine, type-checks clean, may work in staging
     if the old name is still aliased somewhere
  -> Time to catch: hours to days — or never, until production
  -> Risk level: high — silent failure

Stale facts look exactly right. They have the right file path, the right module, the right vocabulary. The only thing wrong is that they are outdated. That specificity is what makes them dangerous — they do not trigger the skepticism that obviously invented code does.

How the stale fact problem happens

AI coding tools index your codebase at a point in time. Cursor indexes when you open a project — or periodically in the background, but not continuously in step with every commit. Managed services run on a schedule. GitHub Copilot uses training data that is months or years old for general patterns, and whatever recent context you feed it explicitly for your specific codebase. In all cases, there is a gap between when the index was built and now. For active codebases, that gap is constantly widening.

Codebases move fast. Functions rename. Services split. Endpoints get versioned. Database schemas migrate — and migration files accumulate, meaning the gap between schema-as-described-in-old-docs and schema-as-it-exists-in-prod grows with every sprint. Event names standardize during platform work. Configuration keys change when the infra team reorganizes the environment setup. Each of these changes is correct and intentional from the engineering team's perspective. None of them propagate to an AI tool's index unless someone triggers a re-index.

The result is an AI tool that has a confident, detailed, wrong picture of your codebase. Not wrong because it is inventing things. Wrong because it is remembering the version of your codebase from six weeks ago, before the billing refactor, before the event schema standardization, before migration #47 dropped that column.

What stale facts look like in practice

The renamed function scenario is one pattern. There are three others that show up regularly in real codebases.

Deprecated service endpoints are particularly costly because they fail asynchronously. The team migrated from /api/v1/orders to /api/v2/orders three sprints ago. The v1 route returns 410 Gone. The AI tool's index still lists v1 as active. An engineer building a new integration asks Cursor how to call the orders API. The model gives them the old route. It fails in production, not in the unit tests that mock the HTTP layer.

Database schema changes are the highest-stakes version of this problem. Migration #47 dropped the preferences_json column from the users table and replaced it with a normalized user_preferences table. The AI's index still references preferences_json. Any generated code that tries to read or write that column gets a silent null on read (if the ORM handles missing columns gracefully) or a database error on write. Neither failure is loud. Both are hard to trace back to a stale AI suggestion made three days earlier.

Event name mismatches are subtle because the publisher and consumer have their own separate files, and an AI tool with a partial view of the codebase sees them independently. The publisher emits order.created. The consumer was updated to listen on order.placed during the event schema standardization project. The AI suggests publishing order.created because that is what it saw in the publisher code. Orders queue up with no downstream processing. Nothing throws an error.

Four stale-fact scenarios — what each looks like, how it fails

Four stale-fact scenarios that reach production:

  1. Renamed function
     Old index: processPayment(order) in billing-service/payment.ts
     Current code: chargeCustomer(order) — renamed in billing refactor
     What happens: compiles if old name is still exported as alias,
       fails at runtime with "not a function" on the code path nobody tests

  2. Deprecated endpoint
     Old index: POST /api/v1/orders — active route
     Current code: /api/v1/orders returns 410 Gone, migrated to /api/v2/orders
     What happens: 410 in staging is caught; in integration tests using mocks,
       nothing fails until the real endpoint is hit

  3. Dropped database column
     Old index: users table has preferences_json column
     Current code: migration #47 dropped preferences_json, split into
       user_preferences table with normalized rows
     What happens: INSERT succeeds on everything except preferences_json,
       silently null in prod until a preferences-reading code path runs

  4. Event name mismatch
     Old index: publisher emits order.created
     Current code: consumer now listens on order.placed after event schema
       standardization — order.created is still emitted, nobody is listening
     What happens: orders silently queue up with no downstream processing

Why this is worse than hallucination

A made-up API name usually fails immediately. The TypeScript compiler catches it. The test suite catches it. At worst, a runtime exception in development catches it before anything reaches staging. The feedback loop is short. The cost is low.

A stale real name compiles. It passes type-checking if the type definitions were updated in a separate commit and the AI's index captured the old ones. It runs in unit tests that mock the dependency. It runs in staging if the old behavior is still present behind a feature flag or a backward-compatible alias. It reaches production because nothing in the normal development workflow is designed to catch "this code is calling the old version of a function that was renamed six weeks ago."

The gap between stale facts and hallucinations is the gap between code that fails loudly and code that fails silently. Loud failures get caught in CI. Silent failures get caught in production, in edge cases, in the specific combination of inputs that exercises the stale code path. By that point, the connection between "AI suggested this three days ago" and "this is now broken in prod" is invisible to anyone running the incident investigation.

The re-feed trap

The standard workaround when a developer notices a stale AI suggestion is to re-feed context: paste the current file contents, or the current function signature, or the current schema definition. The model acknowledges the correction. "You're right, it's chargeCustomer — got it." The next generated suggestion uses the correct name.

Three prompts later, the model drifts back.

This is not a model quality problem. It is a retrieval architecture problem. The model has a stale index that keeps surfacing processPayment as the relevant function whenever billing-related context gets retrieved. You re-feed the correction into the active conversation window. The model holds it for a few turns. Then a new retrieval pull brings up billing module context from the stale index, and processPayment is back in the model's working context. The manual correction competes with a persistent stale source of truth — and the stale source of truth is always there, always getting retrieved, always winning eventually.

The re-feed trap is not a one-time fix. It is an ongoing maintenance burden. You are not correcting the model's understanding. You are correcting the model's output for the current session, while leaving the underlying cause — the stale index — completely intact.

The manual context battle

Teams that have hit this problem repeatedly develop workarounds. They paste relevant files into the conversation at the start of each session. They maintain a CLAUDE.md or .cursorrules file that annotates current function names, current event names, current endpoint versions. They add comments to frequently-renamed functions explicitly calling out that the old name is deprecated. Some teams keep a "recent renames" section in their internal developer docs specifically because AI tools kept suggesting old names.

All of this is high-friction maintenance work that re-breaks with every codebase change. The CLAUDE.md file that correctly documents chargeCustomer needs to be updated the next time a function renames. The .cursorrules annotation needs to be updated the next time an endpoint version changes. The "recent renames" doc needs someone to actually update it — and that someone is usually the same engineer who just did the rename, who is already moving on to the next task.

This is not a documentation discipline problem. Teams that maintain excellent developer documentation still hit this. The root cause is that manual context management is fighting against the physics of software development. Code changes constantly. No documentation system designed for humans to maintain keeps up with that pace when the requirement is machine-readable accuracy.

A current index changes the failure mode

The fix is not better prompting discipline. It is not more thorough manual documentation. It is a context layer that stays current with the codebase automatically — so that when processPayment becomes chargeCustomer, the index reflects that change within hours, not the next time someone opens the project or the next scheduled re-index cycle.

Kognita re-indexes as the codebase changes. When a commit lands that renames a function, updates a database schema, changes an event name, or deprecates an endpoint, the index updates. The MCP endpoint that coding tools query for codebase context serves current names, current schemas, current event contracts — not the snapshot from six weeks ago.

This changes the failure mode. Stale fact suggestions do not happen because there are no stale facts in the index to retrieve. When an engineer asks Cursor to integrate with the billing module, the retrieved context shows chargeCustomer because that is the current function. The re-feed trap does not exist because there is no stale index pulling the model back to old names. The manual context battle does not need to be fought because the source of truth is maintained automatically.

Lifecycle of a function rename — without and with a live index

Lifecycle of a function rename — old experience vs. Kognita-grounded:

  The rename:
  billing-service/payment.ts
  - processPayment(order: Order): PaymentResult   [removed, commit a3f9d1]
  + chargeCustomer(order: Order): ChargeResult    [added, same commit]

  Without live index:
  Session 1 (day of rename): AI knows processPayment — still indexed
  Session 2 (one week later): AI still suggests processPayment — index unchanged
  Session 3 (six weeks later): AI still suggests processPayment — same index
  Developer pastes current file: AI acknowledges chargeCustomer
  Three prompts later: AI drifts back to processPayment suggestion
  Root cause: the stale index keeps pulling the model back to the old name

  With Kognita:
  Commit lands: re-indexing triggered within hours
  Session 2 (one week later): index reflects chargeCustomer
  AI suggestions reference current name — no drift, no re-feeding required

The difference is not in the quality of the AI model. It is in what the model has access to when it generates a suggestion. A model with a live, accurate index of your codebase suggests chargeCustomer. A model with a six-week-old snapshot of your codebase suggests processPayment. Both models are doing the same thing. Only one of them is working from current reality.

Final take

Hallucinations are visible. They fail loudly and get caught fast. The AI coding failure mode worth worrying about is the one that does not fail loudly — the confident, specific, plausible suggestion that references something that used to exist. It compiles. It passes type-checking. It reaches production.

The most dangerous AI coding failures are the ones that look right. Stale facts do not fail immediately. They fail in production, in edge cases, in the code path that exercises the renamed function or the dropped column or the event name that nobody is listening to anymore. The fix is not better prompting discipline — it is a context layer that stays current so the AI never has stale facts to remember in the first place.