Blog
Why AI Coding Agents Hallucinate
12 min read
Most people think hallucinations in AI coding agents look like inventing nonexistent code. And yes — that happens.
But in real software systems, hallucinations are broader and more dangerous than that. AI coding agents typically fail in two completely different ways: false positives and false negatives. Understanding this distinction is critical — because most tooling only focuses on one of them.
Type 1 — false positives
This is the hallucination everyone already knows. The AI confidently believes something exists when it actually does not.
- Nonexistent methods
- Fake abstractions
- Invented APIs
- Imaginary repository patterns
- Incorrect execution assumptions
Example: invented method
Suppose the repository contains:
notificationDispatcher.enqueue(
EmailType.EMAIL_VERIFICATION,
user);The AI agent generates:
notificationDispatcher.sendVerificationEmail(user);Looks perfectly reasonable — but the method does not exist. The model filled missing repository knowledge with a statistically plausible abstraction. That is a classic false positive hallucination.
Another example: fake architecture
Suppose the repository uses event-driven workflows, but the AI assumes synchronous service calls. It generates:
inventoryService.reserve(order);
emailService.sendConfirmation(order);When the actual repository architecture is:
eventBus.publish(new OrderPlacedEvent(order));The AI invented an execution model that does not exist — again, a false positive hallucination.
False positives are usually easier to notice
Something visibly breaks: compile errors, missing methods, failing tests, wrong imports, invalid APIs. Humans catch these relatively quickly. That is why most discussions around hallucinations focus on false positives.
But the more dangerous category is often the other one.
Type 2 — false negatives
False negatives happen when the AI fails to realize something already exists. This is much more subtle — and much more destructive in large systems.
- Missing side effects
- Ignored workflows
- Overlooked dependencies
- Forgotten handlers
- Unseen execution chains
- Duplicate implementations
Example: missing existing logic
Suppose the repository already contains FraudDetectionPipeline. A developer asks: add fraud checks before charging cards. The AI fails to retrieve the existing pipeline and instead creates BasicFraudValidator.
Now the repository contains:
- Duplicated business logic
- Inconsistent fraud behavior
- Architectural fragmentation
The AI did not invent fake code in the obvious sense — it failed to perceive existing code. That is a false negative hallucination.
Another example: hidden side effects
Suppose this entry point exists:
deleteUser(userId);The AI retrieves only:
userRepository.delete(userId);But misses other execution paths:
billingService.cancelSubscriptions(userId);
auditLogger.recordDeletion(userId);
s3AssetCleaner.purgeUserFiles(userId);Now the AI concludes that user deletion only removes database records. That is not an invented fact — it is missing repository awareness. False negative hallucination.
False negatives are harder to detect
Nothing visibly breaks immediately. The generated code may compile, pass tests, and deploy successfully — but operational understanding is wrong. Failures accumulate slowly: duplicated systems, architectural drift, inconsistent business logic, hidden regressions, broken workflows. Large repositories become increasingly fragile.
Most repository hallucinations are actually false negatives
People think AI agents mainly fail because they invent things. In practice, large codebases more often fail because the AI cannot fully perceive what already exists.
Repositories are enormous distributed semantic systems. Meaning is fragmented across services, queues, events, abstractions, interfaces, dependency injection, and async workflows. The AI sees fragments instead of operational wholes.
Retrieval failures create both types
Both hallucination categories often originate from the same root cause: incomplete repository reconstruction.
How false positives happen
The AI lacks repository information, so it predicts what code usually looks like instead of your repository conventions, real APIs, and actual execution patterns. That produces invented abstractions.
How false negatives happen
The AI retrieves only partial repository state, so it fails to see existing implementations, hidden dependencies, downstream effects, and architectural relationships. That produces missing awareness.
Example: payment retry system
Suppose the repository already contains FailedPaymentRecoveryWorkflow. A developer asks: retry failed Stripe charges.
False positive failure: the AI invents StripeRetryManager that does not exist.
False negative failure: the AI misses the existing recovery pipeline and builds an entirely separate retry system. The second failure is often worse.
Why context windows do not fully solve this
Larger context windows do not eliminate hallucinations in the way people assume. Repositories exceed context windows structurally, not just numerically. Even massive windows still suffer from chunk fragmentation, retrieval prioritization, missing graph relationships, and execution ambiguity. More tokens do not equal repository understanding.
Why codebases are especially difficult
Code meaning is distributed. A checkout flow may span controller, validation layer, fraud engine, payment processor, event system, analytics pipeline, and async workers. Humans mentally reconstruct that graph over time. AI agents currently do not — they operate on transient retrieved fragments. That naturally creates both false assumptions and missing awareness.
The hidden problem: AI agents do not understand absence
Suppose the AI does not retrieve GDPRCleanupJob. How does it know the job exists, that retrieval failed, or that the repository uses another naming convention? It often cannot distinguish between does not exist and not currently visible. That ambiguity creates huge reliability problems.
Reducing false positives requires better precision
To reduce invented abstractions: stronger lexical grounding, repository-aware reranking, stricter API retrieval, architectural constraints, graph-aware validation — essentially, make the AI stay closer to repository reality.
Reducing false negatives requires better repository coverage
To reduce missing awareness: better chunking, graph traversal, dependency expansion, execution-aware retrieval, semantic indexing, repository cognition — essentially, help the AI perceive more of the actual system.
The bigger insight
Most people frame hallucinations as model intelligence problems. Increasingly, the harder problem is repository perception. The AI can only reason over the software system it can reconstruct. Today, most coding agents still perceive repositories as fragmented text instead of coherent operational graphs.
That is why hallucinations happen — not because the AI is “random,” but because repository understanding is still fundamentally incomplete.