Blog

MCP Tool Definitions Are Eating 26% of Your Agent's Context Window Before It Starts

10 min read

Before your AI agent types a single line of code, 26% of its working memory is already gone. Not consumed by your codebase. Not used for the task requirements. Consumed by tool definitions — the JSON schemas that tell the model what MCP tools are available and how to call them. On a 200K-token context window, that is 52,000 tokens of overhead before the first user message arrives.

This is the context window tax that most teams building MCP infrastructure don't realize they're paying — and it compounds. Load the tool definitions, retrieve codebase context, include conversation history, add the system prompt. By the time the agent is actually working on the task, it may have less than a quarter of its context window left for reasoning. That's not a prompt engineering problem. It's a harness architecture problem.

How MCP tool definitions consume context

The Model Context Protocol is the right architecture for giving AI agents access to external tools and data sources. The problem is not MCP itself — it is how most teams implement it. The default approach is to define all available tools and load all their schemas at session start, so the model has a complete picture of what it can call. This is intuitive but expensive.

Each MCP tool has a JSON schema that describes its name, parameters, types, descriptions, and return format. A simple tool like "read a file" might consume 300 to 400 tokens. A complex tool like "query the service dependency graph with filters" might consume 800 to 1,200 tokens. Multiply that across 60 to 100 tools — a realistic count for a team that has integrated codebase access, Jira, GitHub, deployment APIs, observability, and CI tools — and the context consumed before any work begins approaches or exceeds 50,000 tokens.

The Harness platform team published data on this when they redesigned their MCP server: the original design consumed approximately 26% of a 200K-token context window on tool definitions alone. Their redesign, using a registry-based approach, cut that to 1.6%. The delta — 24 percentage points — is context that was previously unavailable for actual task reasoning and is now available for it. That is not a marginal improvement. It is the difference between agents that run out of context mid-task and agents that complete the task.

The full context budget breakdown

Large context windows do not solve the context scarcity problem — they change the scale at which the problem appears. A 200K token window sounds abundant until you see how the budget actually gets spent.

Where tokens actually go in a naive MCP setup (200K window)

Context window budget breakdown — naive MCP server setup (200K tokens)

WHERE THE TOKENS GO
────────────────────────────────────────────────────────────────────────────
Tool definitions (all tools loaded upfront)    ~52,000 tokens    26%
System prompt and agent instructions            ~8,000 tokens     4%
Codebase context (retrieved files, snippets)   ~60,000 tokens    30%
Jira/task context                               ~5,000 tokens     3%
Conversation history (multi-turn)              ~25,000 tokens    12%
Available for actual task reasoning            ~50,000 tokens    25%

The agent is doing real work in only 25% of the context window.
The rest is overhead — and tool definitions alone consume more than
the space left for reasoning.

The number that should stop most platform engineers cold: on this setup, the agent is doing real work in roughly 25% of the context window. The rest is infrastructure overhead. For a complex task that needs to reason across multiple services, understand historical decisions, and plan a multi-step implementation, 50,000 tokens of reasoning space runs out fast.

Tasks that hit context limits mid-execution fail in one of two ways: the agent truncates its context and starts forgetting earlier steps, producing output that contradicts decisions it made three rounds ago; or the session terminates with an incomplete result and the developer has to start over. Both failures are expensive. Neither is visible in the tool definition count at setup time. They appear later, when the tasks that actually matter start running into the wall.

The registry pattern: tools on demand

The solution to context window tax from tool definitions is to not load tool definitions upfront. Instead, expose a single registry meta-tool that the agent can query to get the schemas for the specific tools it needs for the current task.

Naive tool loading vs. registry dispatch — context cost and trade-offs

Naive tool loading versus registry-based dispatch

─── NAIVE: ALL TOOLS LOADED UPFRONT ────────────────────────────────────────
MCP server exposes 80 tools (file operations, git, Jira, GitHub, code
  search, service graph, deployment APIs, observability, etc.)
All 80 tool schemas loaded into context at session start
Cost: ~52,000 tokens consumed before the first user message
Problem: agent has access to everything it might need but can afford to
  use almost nothing due to remaining context budget

─── REGISTRY-BASED DISPATCH ────────────────────────────────────────────────
MCP server exposes a single "tool registry" meta-tool
Agent requests tools by capability: "I need file operations and Jira"
Registry returns only the schemas for the 6–8 requested tools
Cost: ~3,200 tokens (1.6% of a 200K window vs. 26%)
Remaining context for task reasoning: ~140,000 tokens

Trade-off: one extra round trip at session start vs. 3–5x more context
  available for actual work.
Net effect: agents complete more tasks without hitting context limits.

The trade-off is one extra round trip at session start. The agent requests its tools, the registry returns the relevant schemas, and then work begins. For most tasks, the agent requests 6 to 12 tools rather than receiving 80 upfront. The context consumed by tool definitions drops from 26% to under 2%, freeing roughly 48,000 tokens for actual task reasoning.

This design requires the agent to know, at session start, roughly what tools it will need. In practice this is not hard: the session context, task description, and a brief tool taxonomy are enough for the agent to make an accurate request. The model is not selecting from 80 options without guidance — it is selecting from a categorized list of capabilities and requesting the relevant schemas. The extra round trip costs a few seconds. The context recovered is available for the entire session.

The symbol index pattern: navigation by pointer

Tool definitions are not the only source of context window tax. The way agents navigate codebases in the default setup creates its own overhead: reading whole files when they need specific functions, loading entire modules when they need one method, consuming thousands of tokens on scaffolding to find the 200 tokens of actual code they needed.

The symbol index pattern addresses this. Instead of the agent navigating the codebase by reading files, the harness maintains a symbol-level index: function names, type definitions, class structures, and their file locations. The agent queries the index by capability or concept, receives a precise pointer to the relevant functions, and reads only what it actually needs.

Standard file navigation versus symbol index — token consumption at scale

Symbol index pattern — navigation by pointer instead of by file

─── STANDARD FILE READ ──────────────────────────────────────────────────────
Agent needs to understand how authentication tokens are validated
Agent reads: /src/auth/auth.service.ts (2,400 tokens)
Agent reads: /src/auth/token.validator.ts (1,800 tokens)
Agent reads: /src/middleware/auth.middleware.ts (1,200 tokens)
Agent reads: /src/lib/jwt.ts (900 tokens)
Total consumed: 6,300 tokens to understand token validation

─── SYMBOL INDEX APPROACH ───────────────────────────────────────────────────
Harness maintains symbol-level index: function names, types, locations
Agent queries: "how does token validation work?"
Index returns: symbol map + 3 specific function signatures + exact locations
Agent reads only the 2 functions it actually needs: ~400 tokens
Total consumed: ~550 tokens (91% reduction)

Real-world impact at project scale:
  Agents using symbol navigation: 76% faster wall time
  Agents using full file reads: hit context limits on 60% of large tasks

The 76% reduction in wall time from the symbol pattern is a secondary effect of the 91% token reduction. When the agent spends fewer tokens on navigation, it has more tokens available for reasoning, which means it completes tasks without hitting context limits rather than truncating context mid-task. Faster wall time is a symptom of better context efficiency, not the other way around.

Why teams don't catch this until it's expensive

The context window tax problem is invisible during development and setup. When a team first configures their MCP server, they test it with simple tasks that fit comfortably in the context window regardless of tool definition overhead. The agent reads a file, calls a tool, returns a result. Everything works. The team ships the integration and moves on.

The problem surfaces at production scale, when tasks get complex enough to stress the context budget. An agent asked to implement a feature across three services, understand the relevant test patterns, and generate a PR description starts hitting context limits mid-task. Developers notice that the agent "forgets" earlier decisions or produces inconsistent output between sessions. Nobody immediately connects this to tool definition overhead because the tool definitions are invisible — they are loaded before the conversation starts and never appear in the UI.

By the time a team diagnoses the problem, they have often already built significant infrastructure on top of the naive design. The MCP server is integrated, the tool schemas are defined, the system prompts reference specific tool names. Migrating to a registry architecture requires reworking the integration layer, not just configuration. Teams that understand this problem before building avoid a painful rebuild.

What this means for teams building their own MCP stacks

Building and maintaining MCP infrastructure is months of engineering work — and the context window tax problem is one of several architectural decisions that have to be right from the start to avoid expensive rebuilds. Registry dispatch, symbol indexing, context budget management, and re-indexing pipelines are each significant engineering problems. Teams that get one right often get another wrong.

The teams with the most context-efficient agent sessions are not the ones who built the most sophisticated custom MCP servers. They are the ones running managed infrastructure that handles these architectural decisions correctly by design — where the registry pattern, the symbol index, and the context budget management are implemented and maintained by infrastructure specialists rather than application engineers working on them as a side project.

Kognita's managed MCP server implements registry-based tool dispatch and symbol-indexed codebase navigation. Sessions connect once and get a context-efficient harness automatically — tool definitions load on demand, codebase navigation uses the symbol index, and the context budget is managed so that agents can complete complex tasks without hitting limits mid-execution. The 26% upfront tax is not a price teams pay when the infrastructure is designed correctly.

Final take

The context window tax is not a hardware problem or a model limitation. It is an architecture choice. Naive MCP server designs load all tool definitions upfront because it is simpler to implement. That simplicity costs 26% of the agent's working memory before any work begins, which compounds into task failure rates that are high enough to undermine confidence in the entire AI tooling investment.

The fix — registry-based dispatch and symbol-indexed navigation — is not exotic. It is the correct default architecture for production MCP servers, and it produces measurable improvements in task completion rates and session efficiency. The question is whether teams discover this before or after they have built significant infrastructure on top of the naive design.

If your AI agents are running out of context mid-task, the first thing to audit is not the context window size. It is what is consuming the context before the task even starts.