Blog

AGENTS.md Is Already Wrong. Here's Why Static Context Files Can't Keep Up.

10 min read

Everyone is writing AGENTS.md files now. OpenAI published a guide for Codex. Augment Code published one for their agents. GitHub has one for Copilot. The pattern is genuinely useful — a structured context file that tells AI agents how the codebase works, what tools to use, what conventions to follow. The problem, which the Augment Code guide calls out directly, is that the hardest part is keeping it accurate. Non-inferable details and custom tooling constraints drift fastest.

"The hardest part" understates it. AGENTS.md starts accurate. It ends up describing a system that no longer exists, using tool commands that no longer work, referencing services that were split or renamed six weeks ago. And unlike a documentation page that's obviously stale — nobody reads documentation about authentication when they can ask an engineer — AGENTS.md is loaded into the agent's context automatically. The agent treats stale AGENTS.md content as current truth. It acts on it. The failures look like hallucination. They are drift.

What AGENTS.md is and why it matters

AGENTS.md is a convention for describing a project to AI coding agents. It lives in the repository root, and agents that support the convention load it at session start as part of their initial context. A well-written AGENTS.md tells the agent what the project does, how the codebase is organized, what development tools exist, how to run tests and builds, what the team's conventions are, and what to avoid.

When it works, AGENTS.md is a force multiplier. An agent that starts a session knowing the test runner, the import conventions, the service architecture, and the shared utilities produces better output from the first message. It doesn't ask for information the team has already encoded. It doesn't use patterns the team considers antipatterns. It can focus immediately on the task rather than spending the session reconstructing the context that the file already provides.

The appeal is obvious. The failure mode is less so — until it happens repeatedly.

Why static context files drift

Codebase documentation is always out of date for a structural reason that has nothing to do with discipline: the codebase changes continuously, but documentation updates require a separate, deliberate act. AGENTS.md has the same structural problem as every other static document that describes a living system. The codebase does not know the file exists. Changes do not trigger updates to it. The gap between what the file says and what is true grows as a function of development velocity, not negligence.

Fast-moving teams with high development velocity accumulate AGENTS.md drift faster. The teams that benefit most from accurate agent context — teams shipping frequently, adding services, evolving conventions — are the teams whose AGENTS.md files go stale the fastest. This is not a paradox; it is the same dynamic that affects all static documentation in high-velocity codebases. More changes means more opportunities for the static file to fall behind.

What drifts fastest — and why it matters

Content drift rates in AGENTS.md — what goes stale first

What drifts fastest in AGENTS.md — and why

CONTENT TYPE              DRIFT RATE    REASON IT DRIFTS
────────────────────────────────────────────────────────────────────────────
Custom CLI tools          Very fast     Tools get renamed, flags change,
and commands                            new tools replace old ones
                                        Nobody thinks to update the file

Build system details      Fast          Build targets added/removed,
and run instructions                    config paths change with refactors
                                        Agents fail silently on stale cmds

Architectural notes       Medium        Services added, split, renamed.
and service descriptions                Descriptions reference topology
                                        that no longer exists

Coding conventions        Medium-slow   Usually stable, but exceptions
and patterns                            accumulate silently in the codebase
                                        even when the file says otherwise

External integrations     Slow          API endpoints, auth methods change
and third-party APIs                    but usually someone remembers

General project context   Slowest       Mission statement level; rarely
and high-level goals                    changes enough to cause failure

The irony: the most important content for AI grounding (tools, services,
architecture) drifts fastest. The least important content (goals, context)
stays accurate the longest.

The hierarchy is backwards from what most teams expect. The content that drifts fastest — tooling commands, service descriptions, build instructions — is also the content most directly responsible for agent success on real tasks. When the test command in AGENTS.md no longer works, the agent wastes several minutes on toolchain exploration before the developer has to intervene. When the service description is stale, the agent may implement logic in the wrong service or duplicate logic that already exists in the service that was renamed.

The content that stays accurate longest — high-level project goals, general engineering philosophy, broad team values — is the content that an agent could infer anyway from the codebase and README. The static file provides the most value where the codebase provides the least: specific tooling, conventions, and architectural decisions that are not derivable from the code. And that is exactly the content that drifts fastest.

What stale AGENTS.md looks like in practice

Real failure modes from AGENTS.md drift — two examples

What an agent sees when AGENTS.md has drifted

AGENTS.md says: "Run tests with: npm run test:unit"
Current reality: "Run tests with: pnpm test -- --workspace=packages/core"
                 (switched to pnpm 6 weeks ago, monorepo split 3 weeks ago)

Agent behavior:
  -> Runs: npm run test:unit
  -> Gets: command not found
  -> Retries with npm run test, npm test, yarn test
  -> Spends 4 minutes on toolchain exploration
  -> May eventually succeed or ask the developer — breaking flow

AGENTS.md says: "Auth is handled by the auth-service"
Current reality: Auth was split into auth-service (login) and
                 identity-service (session management) 8 weeks ago

Agent behavior:
  -> Reads auth-service codebase to understand token refresh
  -> Cannot find token refresh logic (it's in identity-service now)
  -> Generates token refresh logic from scratch
  -> Opens PR that duplicates existing identity-service logic
  -> Developer discovers duplication in code review

Both of these failures look like agent hallucination.
Neither is. They are AGENTS.md drift.

Both failures in the example above look like agent errors. The developer sees the agent try the wrong test command, or watches it generate code that duplicates existing logic. The natural reaction is to blame the agent — "the AI is not reliable" — rather than to audit the context file the agent was working from. Stale context produces failures that are harder to diagnose than hallucination precisely because they are grounded in something that used to be true. The agent is not guessing. It is acting on what it was told.

The frequency of these failures increases as AGENTS.md ages. In the first week after it is written, most content is accurate and the agent performs well. At two months, the tooling section is likely stale, the service descriptions may reference topology that has changed, and the build instructions may have subtle errors from dependency updates. At six months, the file is a liability as much as an asset — providing accurate context on some dimensions while actively misleading the agent on others.

The maintenance burden problem

The obvious solution — keep AGENTS.md updated — turns out to be harder than it sounds at team scale. Who owns the file? On small teams, usually whoever wrote it. On larger teams, ownership is diffuse and updates happen inconsistently. The developer who renames a service knows the AGENTS.md should be updated but does it as part of the PR? Sometimes. Does the PR reviewer check whether the context file reflects the change? Rarely. Is there a CI check that validates AGENTS.md accuracy? Almost never — because validating the accuracy of a natural language description against a codebase requires semantic understanding that shell scripts don't have.

The maintenance burden also scales with codebase velocity, not with team size. A small team shipping fast accumulates more drift than a large team shipping slowly. This means the teams that invest most in AGENTS.md — because they rely most on AI agent productivity — are also the teams whose AGENTS.md is most likely to be wrong at any given moment. High reliance on the file correlates with high drift rate. That is a structural problem, not a process failure.

What always-current context actually looks like

The solution is not to replace AGENTS.md with a better AGENTS.md. It is to recognize that a static text file is the wrong mechanism for content that needs to stay current with a living system — and to use a different mechanism for that content while keeping AGENTS.md for the content that actually should be static.

Static context file vs. always-current codebase index — scope and accuracy

Static context file vs. always-current codebase index

─── AGENTS.md (static file) ────────────────────────────────────────────────
Maintained by: whoever remembers
Updated when: someone notices it's wrong (usually after an agent failure)
Accuracy at 4 weeks: maybe 85%
Accuracy at 3 months: 60–70%
Accuracy at 6 months: unknown — nobody is auditing it
What it covers: whatever was accurate when it was written
What it misses: changes since the last update (often the most recent and
                most relevant changes the agent needs to know about)

─── ALWAYS-CURRENT CODEBASE INDEX ───────────────────────────────────────────
Maintained by: automatic re-indexing on push
Updated when: the codebase changes
Accuracy at 4 weeks: same as day one
Accuracy at 3 months: same as day one
What it covers: current system state — services, patterns, APIs, types
What it misses: intent and decisions not expressed in code
                (still need a small stable context file for this)

The practical approach: AGENTS.md carries the decisions and intent that
cannot be inferred from code. The codebase index carries everything else.
The two together give the agent both the why and the what.

The division of responsibility is important. AGENTS.md should carry the decisions and intent that cannot be inferred from code: why the team chose a particular pattern, which architectural trade-offs were made deliberately, what the team considers antipatterns and why. That content is genuinely non-derivable from the codebase and genuinely stable over time. It is worth writing down and maintaining.

The content that drifts — service topology, tooling commands, API shapes, import conventions — is already in the codebase. A semantic index that re-indexes on every push is always accurate, because the codebase is the source of truth for its own structure. The index does not need to be maintained. It maintains itself.

Kognita's managed codebase index works this way. It re-indexes automatically on push, so the service graph, tooling discovery, import patterns, and API shapes that agents rely on are always current without anyone having to update a file. The AGENTS.md a team still needs — the one that carries team decisions and intent — stays small enough to actually maintain. The large, fast-drifting sections that should have been an index all along become an index.

Final take

AGENTS.md is a good idea with a structural flaw: it is a static file describing a dynamic system, and the most important content drifts fastest. Teams that rely on it as their primary context mechanism will see agents performing well immediately after it is written and progressively worse as the codebase evolves. The performance degradation is silent — it looks like agent unreliability rather than file staleness.

The fix is not a better process for updating AGENTS.md. It is to narrow AGENTS.md to what it is good at — non-derivable, stable intent — and to use an always-current codebase index for everything else. The combination gives agents both the why and the what, with the what staying accurate automatically as the codebase changes.

AGENTS.md that drifts produces AI sessions that confidently use stale information. That is harder to detect and more damaging than a hallucination, because at least a hallucination sounds wrong.