Blog

Why AI Coding Agents Hallucinate

12 min read

Most people think hallucinations in AI coding agents look like inventing nonexistent code. And yes — that happens.

But in real software systems, hallucinations are broader and more dangerous than that. AI coding agents typically fail in two completely different ways: false positives and false negatives. Understanding this distinction is critical — because most tooling only focuses on one of them.

Type 1 — false positives

This is the hallucination everyone already knows. The AI confidently believes something exists when it actually does not.

Nonexistent methods
Fake abstractions
Invented APIs
Imaginary repository patterns
Incorrect execution assumptions

Example: invented method

Suppose the repository contains:

Actual API

notificationDispatcher.enqueue(
    EmailType.EMAIL_VERIFICATION,
    user);

The AI agent generates:

Plausible but wrong

notificationDispatcher.sendVerificationEmail(user);

Looks perfectly reasonable — but the method does not exist. The model filled missing repository knowledge with a statistically plausible abstraction. That is a classic false positive hallucination.

Another example: fake architecture

Suppose the repository uses event-driven workflows, but the AI assumes synchronous service calls. It generates:

Assumed synchronous model

inventoryService.reserve(order);
emailService.sendConfirmation(order);

When the actual repository architecture is:

Actual event-driven model

eventBus.publish(new OrderPlacedEvent(order));

The AI invented an execution model that does not exist — again, a false positive hallucination.

False positives are usually easier to notice

Something visibly breaks: compile errors, missing methods, failing tests, wrong imports, invalid APIs. Humans catch these relatively quickly. That is why most discussions around hallucinations focus on false positives.

But the more dangerous category is often the other one.

Type 2 — false negatives

False negatives happen when the AI fails to realize something already exists. This is much more subtle — and much more destructive in large systems.

Missing side effects
Ignored workflows
Overlooked dependencies
Forgotten handlers
Unseen execution chains
Duplicate implementations

Example: missing existing logic

Suppose the repository already contains FraudDetectionPipeline. A developer asks: add fraud checks before charging cards. The AI fails to retrieve the existing pipeline and instead creates BasicFraudValidator.

Now the repository contains:

Duplicated business logic
Inconsistent fraud behavior
Architectural fragmentation

The AI did not invent fake code in the obvious sense — it failed to perceive existing code. That is a false negative hallucination.

Another example: hidden side effects

Suppose this entry point exists:

Facade

deleteUser(userId);

The AI retrieves only:

Partial view

userRepository.delete(userId);

But misses other execution paths:

Additional paths (not retrieved)

billingService.cancelSubscriptions(userId);
auditLogger.recordDeletion(userId);
s3AssetCleaner.purgeUserFiles(userId);

Now the AI concludes that user deletion only removes database records. That is not an invented fact — it is missing repository awareness. False negative hallucination.

False negatives are harder to detect

Nothing visibly breaks immediately. The generated code may compile, pass tests, and deploy successfully — but operational understanding is wrong. Failures accumulate slowly: duplicated systems, architectural drift, inconsistent business logic, hidden regressions, broken workflows. Large repositories become increasingly fragile.

Most repository hallucinations are actually false negatives

People think AI agents mainly fail because they invent things. In practice, large codebases more often fail because the AI cannot fully perceive what already exists.

Repositories are enormous distributed semantic systems. Meaning is fragmented across services, queues, events, abstractions, interfaces, dependency injection, and async workflows. The AI sees fragments instead of operational wholes.

Retrieval failures create both types

Both hallucination categories often originate from the same root cause: incomplete repository reconstruction.

How false positives happen

The AI lacks repository information, so it predicts what code usually looks like instead of your repository conventions, real APIs, and actual execution patterns. That produces invented abstractions.

How false negatives happen

The AI retrieves only partial repository state, so it fails to see existing implementations, hidden dependencies, downstream effects, and architectural relationships. That produces missing awareness.

Example: payment retry system

Suppose the repository already contains FailedPaymentRecoveryWorkflow. A developer asks: retry failed Stripe charges.

False positive failure: the AI invents StripeRetryManager that does not exist.

False negative failure: the AI misses the existing recovery pipeline and builds an entirely separate retry system. The second failure is often worse.

Why context windows do not fully solve this

Larger context windows do not eliminate hallucinations in the way people assume. Repositories exceed context windows structurally, not just numerically. Even massive windows still suffer from chunk fragmentation, retrieval prioritization, missing graph relationships, and execution ambiguity. More tokens do not equal repository understanding.

Why codebases are especially difficult

Code meaning is distributed. A checkout flow may span controller, validation layer, fraud engine, payment processor, event system, analytics pipeline, and async workers. Humans mentally reconstruct that graph over time. AI agents currently do not — they operate on transient retrieved fragments. That naturally creates both false assumptions and missing awareness.

The hidden problem: AI agents do not understand absence

Suppose the AI does not retrieve GDPRCleanupJob. How does it know the job exists, that retrieval failed, or that the repository uses another naming convention? It often cannot distinguish between does not exist and not currently visible. That ambiguity creates huge reliability problems.

Reducing false positives requires better precision

To reduce invented abstractions: stronger lexical grounding, repository-aware reranking, stricter API retrieval, architectural constraints, graph-aware validation — essentially, make the AI stay closer to repository reality.

Reducing false negatives requires better repository coverage

To reduce missing awareness: better chunking, graph traversal, dependency expansion, execution-aware retrieval, semantic indexing, repository cognition — essentially, help the AI perceive more of the actual system.

The bigger insight

Most people frame hallucinations as model intelligence problems. Increasingly, the harder problem is repository perception. The AI can only reason over the software system it can reconstruct. Today, most coding agents still perceive repositories as fragmented text instead of coherent operational graphs.

That is why hallucinations happen — not because the AI is “random,” but because repository understanding is still fundamentally incomplete.