Blog
Why Context Windows Will Never Be Enough
15 min read
The AI industry currently treats larger context windows like a universal solution. Every few months, another announcement appears:
100k tokens
200k tokens
1 million tokens
10 million tokensThe assumption is straightforward: if the model can see more of the repository, it will understand the repository better.
But software systems are not just large text blobs. They are execution graphs, dependency networks, architectural systems, behavioral surfaces, and operational flows. Those structures do not emerge automatically from larger token windows. That is why context scaling alone will never fully solve AI coding.
The industry is solving the wrong problem
Most people frame repository understanding as a context size problem. Increasingly, the harder problem is a repository reconstruction problem. Those are not the same thing.
Example: a one-million-token repository dump
Imagine giving an LLM an entire monorepo inside a giant context window. Will the model now understand service ownership, runtime execution, architectural boundaries, async workflows, hidden side effects, or operational dependencies? Not really. Repository understanding is not merely about visibility — it is about structure.
Humans don't understand repositories by reading everything
Senior engineers do not understand large codebases by reading every file sequentially. They build abstractions, mental maps, dependency models, architectural understanding, and execution intuition. They navigate repositories graphically, not linearly.
Example mental model:
CheckoutController
→ PaymentService
→ FraudEngine
→ StripeGateway
→ EventBus
→ EmailWorkflowLLMs currently consume mostly flattened text — a completely different representation.
Bigger context windows create attention dilution
This problem gets worse as context grows. Suppose a monorepo contains many retry systems, payment abstractions, notification frameworks, and event buses. A developer asks: where do we retry failed Stripe charges. A gigantic window may include retry middleware, Kafka retries, HTTP retries, cron retries, webhook retries, and payment retries. The model now faces signal dilution — more context does not necessarily improve relevance; sometimes it destroys it.
The model still does not know what matters
Even if the entire repository fits into context, the model still lacks importance weighting, architectural prioritization, operational boundaries, ownership understanding, and execution salience. The repository is visible — meaning is not automatically reconstructed.
Repository meaning is non-local
Software behavior is distributed. A single line may trigger behavior hundreds of files away:
eventBus.publish(new OrderPlacedEvent(order));That may activate inventory reservation, fraud analysis, analytics, email workflows, billing pipelines, and async retries — across services, queues, consumers, and orchestrators. No contiguous token window naturally captures that operational graph.
Context windows flatten structure
Repositories are hierarchical systems. Context windows are flat sequences. Flattening destroys graph relationships, execution topology, dependency directionality, and architectural boundaries. The model receives ordered tokens instead of behavioral structure — an enormous mismatch.
Example: false confidence from visibility
Suppose the model sees:
userRepository.delete(userId);But the actual deletion workflow also includes:
billingService.cancelSubscriptions()
s3AssetCleaner.purgeFiles()
auditLogger.recordDeletion()If retrieval or attention misses those flows, the model concludes that user deletion only removes database records — even though the repository technically existed inside context. Visibility is not understanding.
Bigger windows also increase noise
As repositories scale, giant contexts introduce duplicated abstractions, stale implementations, dead code, test utilities, generated files, and framework boilerplate. The model must separate operationally relevant behavior from semantically similar noise — which becomes harder as context grows.
This is why retrieval still matters
People imagine future systems dumping repositories into gigantic windows. In practice, retrieval becomes more important — the problem shifts from “can the model see enough?” to “can the system surface the right operational structure?” That is a fundamentally different problem.
The future is probably hierarchical context
The strongest systems likely will not use one giant flat context. Instead they assemble layered representations: semantic summaries, execution graphs, repository maps, dependency structures, behavioral flows, and retrieved implementation detail. That starts looking less like autocomplete — and more like repository cognition.
Context windows solve storage, not understanding
Large windows help memory, visibility, persistence, and retrieval bandwidth. They do not automatically solve architecture understanding, execution reasoning, dependency awareness, or operational modeling. Those require structure-aware systems.
The bigger shift
The industry is slowly moving from text generation toward repository understanding. And repository understanding is fundamentally a graph problem — not merely a token problem. The future probably belongs to systems that reconstruct execution intent, architectural structure, operational relationships, and semantic ownership — because once the AI actually understands the repository, hallucinations drop dramatically.
Codebases are graphs, not documents
Most AI systems still treat repositories like giant text corpora — conceptually, a collection of files. That mental model breaks almost immediately at scale. Large software systems are not documents; they are graphs.
Documents are mostly linear
Natural language is usually sequential; meaning is primarily local. That is why standard RAG works reasonably well for blogs, PDFs, and support articles.
Codebases behave completely differently
Software meaning is distributed across relationships. CheckoutController may depend on payment, fraud, inventory, events, analytics, and notifications — none of which are necessarily local in the filesystem. Operational meaning lives in the graph connecting them.
This is why naive retrieval fails
Traditional RAG retrieves nearby text, semantically similar chunks, or lexical matches. Software meaning is often many hops away.
Example query: where do we cancel subscriptions when users delete accounts. Relevant behavior may span:
UserDeletionWorkflow
→ AccountCleanupService
→ BillingCancellationHandler
→ StripeSubscriptionManagerNo single chunk contains the complete behavior — because the meaning exists in the graph.
The file system lies
The tree src/services/controllers/utils/ is not the real architecture — execution flow is. Directories may slice code one way while runtime operations cut across them. Humans learn that over time; AI systems usually do not.
Humans already think graphically
Senior engineers reason through dependency chains, ownership boundaries, execution flows, event propagation, and service topology. When debugging payments, they mentally traverse request → validation → fraud → charge → event → async workflows. That is graph traversal, not document retrieval.
Embeddings alone cannot capture this
Dense embeddings excel at semantic similarity. Graphs encode causality, dependencies, execution order, ownership, and runtime relationships — not merely semantic neighborhoods. A query like retry failed Stripe charges may retrieve HTTP retry middleware, Kafka retries, webhook retries, and cron retries instead of the actual billing recovery workflow — because semantic similarity is not operational understanding.
Chunking quietly breaks graph structure
Suppose an execution flow exists across controller → payment service → Stripe client → event publisher. Naive chunking isolates each into separate retrieval units. The AI sees disconnected syntax instead of connected behavior — the graph disappears.
Most AI coding agents operate on flattened repositories
Current systems often look like:
repository
↓
split into chunks
↓
embed chunks
↓
retrieve chunks
↓
send text to LLMNotice what disappears: topology, execution structure, relationships, dependencies. The repository graph becomes isolated text fragments — one of the core reasons agents get lost in large systems.
Example: architectural hallucination
Suppose the convention is: all business logic belongs in domain services. The AI retrieves only controller chunks and generates policy checks inside the controller. The code compiles — but it is architecturally wrong because the model failed to perceive repository structure, not because it ran out of tokens.
The future is probably graph-aware retrieval
Instead of retrieving isolated chunks by semantic similarity alone, future systems likely retrieve connected operational subgraphs:
Payment Retry Flow
→ retry scheduler
→ failed charge queue
→ Stripe recovery worker
→ notification workflowNow the AI reasons over behavior instead of disconnected syntax.
Code search is quietly becoming graph search
The problem is evolving from “find similar text” toward “reconstruct operational meaning” — a graph problem, not a document retrieval problem.
The important insight
Most AI coding failures happen because repositories are graph-shaped while AI systems still perceive them as documents. That mismatch creates hallucinations, missing dependencies, duplicated logic, architectural mistakes, and incomplete reasoning.
The systems that win will likely be the ones that best reconstruct execution topology, dependency structure, behavioral relationships, and operational flows — because codebases are not documents. They are distributed behavioral graphs disguised as text.