Blog

Running /init on a Legacy Codebase Gives You a Confident, Wrong Map

10 min read

On a fresh project, claude /init produces a reasonable summary — there is not much there yet, and what exists is what runs. On a fifteen-year-old codebase with three eras of architecture layered on top of each other, it produces something far more dangerous: a clean, confident description that flattens dead code, abandoned patterns, and the one path that actually serves production into a single tidy narrative. The output looks authoritative. It is also substantially wrong, and wrong in the way that is hardest to catch.

A static scan can't tell live from dead

The core limitation is simple. /init reads what is present in the repository. A legacy codebase is defined by how much of what is present is no longer real:

Present in the repo is not the same as live in production

What /init can see vs. what a legacy repo is

  /init reads:
    -> directory structure, file names
    -> package manifests, imports
    -> surface patterns it can pattern-match

  A 15-year-old codebase actually contains:
    -> 3 eras of architecture, layered, none deleted
    -> 2 ORMs: one live, one only used by a cron nobody runs
    -> a "utils/" folder that is 60% dead code
    -> the ONE path that serves production traffic, unmarked

  Static reading cannot tell live from dead. It describes all of it
  as if it were equally real.

A human who has worked on the system for a year knows the utils/ folder is mostly graveyard and that the real request path goes through the 2024 rewrite. A one-time file scan knows none of that. It sees imports and patterns and assigns them all equal reality, because static reading has no concept of what executes.

The result is a plausible map of the dead half

The danger is not that /init leaves things out. It is that it confidently documents the wrong things with specifics that read as authoritative:

Specific, plausible, and pointed at the abandoned code

The confident, wrong map

  /init writes into CLAUDE.md:
    "Data access uses the LegacyOrmRepository pattern
     (see db/legacy_orm.py). Background jobs run via Resque."

  Reality:
    -> LegacyOrmRepository is still imported in 40 files...
       ...all of which are behind a feature flag turned off in 2022
    -> live traffic uses the new QueryService (added 2024)
    -> Resque was replaced by Sidekiq 18 months ago;
       the Resque config lingers but nothing dispatches to it

  The map is plausible, specific, and points at the dead half
  of the codebase. An agent will follow it straight there.

Now that description lives in CLAUDE.md, loaded every session, and the agent treats it as ground truth. Ask it to add a feature to data access and it extends LegacyOrmRepository — the dead pattern — because the map said that is how data access works. This is exactly the hallucination mechanism where confident wrong context is worse than no context at all.

Legacy is where context matters most and /init helps least

This is the cruel inversion. The codebases where AI assistance would be most valuable — old, sprawling, poorly documented, the ones no single person fully understands — are exactly the ones where a static scan misleads the most. On a greenfield repo you barely need /init; on a legacy monolith you need real understanding, and /init gives you a confident misreading. The harder the codebase, the wider the gap, a point we make in why AI tools fail on legacy codebases.

Reading legacy code requires knowing what runs

The thing that separates a useful map of a legacy system from a misleading one is execution awareness — knowing which code actually reaches a live entry point and which is archaeology:

Equal-weight scan vs. execution-aware index

What reading a legacy codebase actually requires

  Static scan (/init):
    -> equal weight to every file that exists
    -> no signal for what runs, what's dead, what's deprecated

  Execution-aware semantic index:
    -> understands call graphs and what reaches entry points
    -> surfaces the path that actually serves the request
    -> distinguishes "present in the repo" from "live in prod"

An index that understands call graphs and reachability can distinguish the path that serves the request from the forty files behind a dead feature flag. That is the difference between a map of the codebase and a map of the parts of the codebase that matter.

Where Kognita fits

Kognita builds a semantic, execution-aware index of the repository rather than a one-time prose summary. It captures the relationships and call structure that reveal which patterns are live and which are vestigial, and it re-derives that picture from the current source instead of freezing a first impression into a file. On a legacy codebase, that is the difference between an agent that confidently edits the dead half and one that finds the path actually serving production. The grounding reflects what runs, not merely what is checked in.

Final take

/init on a legacy codebase does not fail by omission — it fails by confidence. It hands you a clean, specific, plausible description that gives equal weight to live code and dead code, then freezes it into a file the agent trusts every session.

A static scan can only describe what exists, not what runs. On a codebase where half of what exists is dead, that is not a map — it is a confident wrong turn.