Blog

Adding Files to Claude's Context Is Not the Same as an Indexed Codebase

9 min read

The default Claude Code workflow is to add files to the conversation: read a file, paste some output, ask a question. Claude reasons over whatever is in the active context window. This works for focused, file-level tasks and produces good results when the developer knows exactly which files are relevant. It is not the same as a semantically indexed codebase — and the difference matters as soon as the relevant context exceeds what fits in a single session.

File-in-context vs. indexed codebase: the architecture

These are two distinct approaches to giving Claude information about your codebase. They differ in scale, persistence, and team accessibility:

File-in-context vs. indexed codebase

File-in-context approach (Claude Code default):
  1. Developer opens files relevant to their task
  2. Files are read into the conversation context
  3. Claude reasons over the files in context
  4. Conversation ends → context is gone
  5. Next session: start over, re-open files, re-explain

  Scale limit: context window fills ~50–100 files
  Persistence: none (context is per-session)
  Team sharing: none (private conversation)

Indexed codebase approach

Indexed codebase approach:
  1. Repository indexed once (or continuously updated)
  2. Semantic search surfaces relevant chunks on demand
  3. Claude reasons with retrieved context
  4. Session ends → index persists
  5. Next session: index is ready, same retrieval quality

  Scale: handles entire large codebase
  Persistence: survives session boundaries
  Team sharing: same index for every developer

The file-in-context approach works well for tasks where the developer already knows what to include: fixing a specific function, reviewing a specific file, refactoring a bounded piece of logic. It is the right tool for local, well-scoped work. The indexed approach is necessary when the question requires searching across the codebase, surfacing context the developer doesn't already know is relevant, or sharing context between sessions and team members.

Why the context window hits its limit at codebase scale

Even with a 100,000 token context window — Claude's extended context — the amount of codebase that fits is limited. Most non-trivial questions about a real production service require context from more files than a context window can hold:

Context window limits at production codebase scale

Context window limits at codebase scale:
  Average file:      200–400 lines (~3–6k tokens)
  100k token window: fits ~20–40 files at most
  Medium service:    300–500 files
  Large service:     1,000+ files

  "Just add more files to context":
    → window fills
    → important context gets compressed or dropped
    → model starts losing early context (lost-in-middle)
    → responses degrade as context approaches limit

There is also a quality degradation problem at large context sizes. Research on "lost in the middle" effects shows that model reasoning quality degrades as relevant information gets buried in a very long context. Adding more files to context does not linearly improve answer quality — at some point, it actively hurts it, which is the mechanism described in the MCP context window tax.

The session boundary problem

Context window approaches are session-scoped. When the conversation ends, the context is gone. The next session starts fresh — you re-open the same files, re-explain the same background, re-establish the same context. For a developer who works in the same codebase area daily, this is a constant tax: rebuilding context at the start of every session.

An indexed codebase does not have this problem. The index persists between sessions, and semantic retrieval surfaces relevant context on demand rather than requiring the developer to know in advance which files to include. This is the difference between "Claude knows about your codebase" and "Claude has a codebase it can search."

How MCP + indexed codebase changes what Claude can answer

The combination of an MCP server providing semantic search and a shared codebase index fundamentally changes the category of questions Claude can answer. Instead of reasoning over what the developer put in context, Claude can retrieve from the full indexed codebase:

MCP + indexed codebase vs. file-in-context

Kognita MCP + indexed codebase:
  Developer asks:   "What services call the payment API?"
  MCP retrieves:    semantic search over indexed repositories
  Claude reasons:   over retrieved cross-repo results
  Answer includes:  all callers, even in separate repos

  vs. file-in-context:
  Developer opens:  payment API file
  Claude sees:      only what is in context
  Answer:           "I can see the payment API definition"
  Missing:          all callers, all downstream effects

Kognita provides exactly this combination — a managed semantic index of your repositories exposed through an MCP server, so that Claude sessions can retrieve from the full codebase on demand rather than being limited to whatever files the developer manually added to context. The questions that were previously unanswerable because they required context scattered across dozens of files become answerable in a single query.

When file-in-context is still the right approach

File-in-context is not obsolete. For focused, local tasks — implementing a specific feature in a specific file, reviewing a PR diff, fixing a known bug in a known location — manually adding files to context gives Claude precise, focused information and avoids retrieval noise. The developer's explicit file selection is often better than automated retrieval for well-scoped local work.

The indexed approach is most valuable for exploratory questions, cross-codebase analysis, impact analysis, and any task where the developer does not already know which files are relevant. These are the high-value, high-cost questions — the ones currently routed to senior engineers because they require system-level knowledge that does not fit in a single file.

Final take

Adding files to Claude's context window is the simplest way to give Claude codebase knowledge, and for local, well-scoped work it is the right approach. It is not a substitute for an indexed codebase — it has no persistence, no cross-session memory, no team sharing, and it hits a hard scale limit at anything beyond a handful of files.

The context window gives Claude what you put in it. An indexed codebase gives Claude what is relevant — surfaced on demand, persisted between sessions, shared across the team.