Blog

AI Will Explain Every Line. It Won't Tell You Why the Architecture Works That Way.

10 min read

Paste a function into ChatGPT or Claude and ask what it does. You will get a clean, accurate explanation: the variable names, the control flow, the edge cases, what the function returns under what conditions. For a function you have never seen before, this is genuinely useful. It removes the friction of reading unfamiliar code. The problem surfaces when you ask the next question — the one that actually matters for onboarding into a real system: why is it structured this way?

That question has a different answer. Not because AI is wrong about the code — it is right about the code. But because the answer to "why" is not in the lines. It is in the architectural decisions that predated the lines, the constraints that shaped them, and the system-level invariants they implement. An AI that has only the text of the code produces explanations that are locally correct and architecturally uninformed. For an onboarding engineer trying to understand not just what a service does but why it is safe to change — that distinction is the whole problem.

What AI explains well

Line-level narration is a genuine capability. Given a function, AI will accurately describe what each statement does, what the variables hold, what the conditional branches produce, and what the return value means. For well-named code, this is approximately what a senior engineer would explain to a junior one reading the file for the first time. For poorly named code, AI can infer intent from structure and common patterns well enough to be useful.

Control flow tracing is also solid. AI can follow branching logic, identify the paths through a function, and explain what conditions lead to which outcomes. For recursive logic, nested conditionals, or async chains, this is legitimately useful — it removes the manual effort of tracing execution paths by hand.

Local logic — a single file, a module, a class — is where text-based AI performs best. The information required to explain local logic is contained in the local context. What the function does, what methods are available, what data structures are used — these are all visible in the text of the code itself. AI explanations at this level are reliable precisely because everything needed is in the context window.

What AI cannot explain from code alone

Architecture is made of decisions, not just implementations. When you ask why a service is structured a certain way, you are asking about a decision — who made it, under what constraints, with what alternative options rejected, and with what consequences intended. None of that is in the code. The code is the outcome of the decision. The decision itself lives in architecture decision records, postmortem documents, Jira ticket comment threads, Slack conversations that were never written down, and the memories of the engineers who were in the room.

This means that any AI explanation derived solely from code text is structurally incomplete for architectural questions. It is not a capability gap that will be closed by better models or larger context windows — it is a data availability problem. The architectural context simply does not exist in the code text, so no model can retrieve it from there. An AI that has indexed only the code will give you a description of the implementation. It cannot give you the decision that produced it.

The three types of architectural context that live nowhere in the text

Historical decisions

Codebases are time-stratified. Code written three years ago implements decisions that were correct at the time and may be technical debt today, or may be immovable constraints whose removal would require significant coordinated effort across teams. An engineer looking at a custom JWT validation implementation today has no way to know from the code whether it is a refactoring opportunity or a compliance-required artifact of an acquisition. Both look identical in the text. The difference is entirely in the history that produced the code, not the code itself.

Constraint-driven patterns

Many architectural patterns exist not because they are technically optimal but because they satisfy external constraints: regulatory requirements, vendor limitations, contractual obligations, or operational agreements reached after incidents. A sequential message processor that looks like it should be parallelized for performance may be sequential because a downstream service requires ordered state transitions. The constraint is real. The code communicates nothing about it. An AI that reads the code and explains the mechanism has told you something accurate that leads you toward the wrong action if you change it.

Cross-service invariants

Many architectural decisions are not visible within a single service at all — they are properties of the system as a whole. The rule "UserService is the only service that writes to the users table" is not enforced in the code of any individual service. It is an invariant maintained by convention and ownership. An onboarding engineer reading UserService's code will not encounter this invariant. They will encounter only UserService's implementation, which neither states nor enforces the cross-service rule. Understanding that the rule exists requires a graph-level view of the system that transcends any individual file or service.

What AI correctly explains vs. what it misses — the AuthService custom JWT implementation

What AI correctly explains vs. what it misses — AuthService example:

  THE CODE
  -------------------------------------------------------------------
  async function validateToken(token: string): Promise<DecodedToken> {
    const decoded = jwt.decode(token, { complete: true })
    if (!decoded || typeof decoded === 'string') throw new AuthError('INVALID_TOKEN')

    const { header, payload } = decoded
    if (header.alg !== 'RS256') throw new AuthError('UNSUPPORTED_ALGORITHM')

    // Custom key fetch — does not use jwks-rsa or well-known endpoint
    const key = await fetchSigningKey(payload.kid)
    const verified = jwt.verify(token, key, { algorithms: ['RS256'] })
    return verified as DecodedToken
  }
  -------------------------------------------------------------------

  WHAT AI CORRECTLY EXPLAINS
  -> Decodes JWT without verifying, extracts header and payload
  -> Rejects tokens not signed with RS256
  -> Fetches signing key using a custom key retrieval function
  -> Verifies the token against the fetched key
  -> Returns the verified decoded payload
  Accuracy: high. This is what the code does.

  WHAT AI CANNOT EXPLAIN FROM THE CODE TEXT
  -> Why is a custom key fetch used instead of the standard jwks-rsa library?
     (Answer: the key management infrastructure predates JWKS standards by 4
     years; migrating would require a coordinated key rotation across 12 services
     on a defined maintenance window — a compliance engineering decision, not
     a technical preference)
  -> Why is the JWKS well-known endpoint not used?
     (Answer: the identity provider used in the regulated environment does not
     expose a standard discovery endpoint; this is a vendor constraint, not
     an oversight)
  -> Why does this implementation exist separately from the shared auth library?
     (Answer: this service was acquired; migrating its auth stack requires
     contractual sign-off from the acquiring entity's legal team, not just an
     engineering decision)

  The AI explanation is locally correct. It is architecturally uninformed.
  An onboarding engineer who reads the AI explanation and concludes the custom
  key fetch is a candidate for refactoring has been misled — not by wrong
  facts, but by the absence of context the code cannot contain.

Why architecturally uninformed explanations mislead onboarding engineers

The problem with a locally correct, architecturally uninformed explanation is not that it is wrong — it is that it is confidently incomplete in ways that are not detectable without independent verification. An onboarding engineer who asks AI why a service uses a custom implementation instead of a standard library and receives no answer will know they have an open question. An onboarding engineer who receives a plausible-sounding explanation that omits the compliance constraint will not know they have a misunderstanding. They will act on it.

The failure mode this produces is predictable: engineers attempt to refactor code that cannot safely be changed. They propose changes that violate cross-service invariants they did not know existed. They model system behavior incorrectly and design features against a mental model that reflects the local implementation but not the architectural constraints that govern it. These are not failures of intelligence or diligence — they are failures of information. The engineer did not have what they needed to reason correctly, and the AI explanation they received did not tell them they were missing it.

This problem scales with codebase age and complexity. A new service with no historical constraints and obvious intent is well-served by text-only AI explanation. An acquired service with a four-year history of constraint-driven architectural decisions, two major migrations, and three ownership changes is not. The services that most need good architectural explanation — legacy systems, core platforms, acquired codebases — are exactly the services where text-only AI explanation is most likely to mislead. Architectural knowledge loss compounds over time, and AI that cannot access that knowledge cannot prevent engineers from repeating the decisions that already went wrong.

The three questions architectural context must answer that code text cannot

The three questions architecture requires that code text cannot answer:

  1. WHY WAS THIS DESIGNED THIS WAY?
     Example: "Why does OrderService compute shipping cost instead of ShippingService?"
     What code shows: the computation lives in OrderService.
     What code cannot show: it was put there 3 years ago to preserve a transactional
     boundary — shipping cost must be calculated in the same transaction as order
     total to satisfy an accounting requirement. Moving it would break financial
     reporting consistency across a quarterly audit trail.

  2. WHAT WOULD BREAK IF THIS CHANGED?
     Example: "Can I refactor this queue consumer to process messages concurrently?"
     What code shows: it processes messages one at a time in a for-await loop.
     What code cannot show: the sequential processing is a hard invariant. A
     downstream service builds state from the message order. Concurrent processing
     would produce non-deterministic state transitions that would only manifest as
     data corruption under load, not in unit tests.

  3. WHY DOES THIS NOT USE THE STANDARD APPROACH?
     Example: "Why is there a custom retry implementation instead of using the
     platform retry middleware?"
     What code shows: a hand-rolled exponential backoff with jitter.
     What code cannot show: the platform retry middleware was introduced 18 months
     after this service was built. It was never migrated because the custom
     implementation has specific backoff tuning validated against the downstream
     service's rate limits — tuning that is not representable in the platform
     middleware's configuration API.

  In all three cases, the AI answer to "what does this do?" is correct.
  The AI answer to "why does it work this way?" is either absent or wrong.

What execution-aware, semantic indexing makes available

The gap between what code text provides and what architectural understanding requires is a data availability problem with a tractable solution: index not just the code text but the execution context, the service graph, the work management history, and the architectural decision records that produced the codebase's current state. When those sources are indexed together and made queryable, the answers to architectural questions become retrievable rather than requiring an expert who was present when the decisions were made.

Execution-aware indexing adds a layer that text-only tools miss: how does the code actually behave at runtime? Which services call which other services, with what frequency, and under what conditions? Where are the transactional boundaries? What are the actual call contracts between services, not just the declared interfaces? This runtime context is what makes it possible to answer questions like "what would break if this changed" with accuracy rather than inference.

Semantic indexing connects the code to the decisions that produced it. When a Jira epic, its associated pull requests, and the architectural decision records it referenced are all part of the same queryable index, the question "why does this implementation exist?" becomes answerable in a way it cannot be from code text alone. The custom JWT implementation maps to the acquisition project. The sequential message processor maps to the incident postmortem that mandated sequential processing. The cross-service data ownership rule maps to the architectural decision record that established it.

Final take

AI is a reliable narrator of code. It is not a reliable interpreter of architecture. The distinction matters most at the moment it is hardest to perceive: when an onboarding engineer receives a confident, locally accurate explanation and proceeds to act on it without knowing what was left out. The cost of acting on an architecturally uninformed explanation is proportional to the age and complexity of the system — and in legacy systems, that cost is measured in incidents and failed migrations, not just wasted hours.

The answer is not to distrust AI code explanation — it is to recognize the category boundary between what text-only explanation can reliably provide and what it cannot. Line-level narration and control flow tracing are genuine capabilities. Architectural context — why the system is structured the way it is, what constraints govern it, what invariants span service boundaries — requires sources that are not in the code text. Without those sources in the index, AI sessions give onboarding engineers a map that accurately describes the roads but omits the terrain.

Text-only AI explanation vs. execution-aware semantic explanation — UserService caching example

Text-only AI explanation vs. execution-aware semantic explanation:

  QUESTION: "Why does UserService.getUser() sometimes return a cached response
  and sometimes fetch from the database?"

  TEXT-ONLY AI EXPLANATION
  -> The function checks a Redis cache before querying the database.
  -> If the cache key exists and is not expired, it returns the cached value.
  -> Otherwise it queries the database and stores the result in cache.
  Completeness: describes the mechanism. Does not explain the design intent.

  EXECUTION-AWARE SEMANTIC EXPLANATION (Kognita)
  -> The cache was introduced in 2023-Q2 as a targeted response to a specific
     incident: UserService was responsible for 67% of database load during peak
     traffic events. The cache TTL of 90 seconds was selected based on the measured
     staleness tolerance of the three highest-traffic callers at the time.
  -> The cache is intentionally not used for write paths (createUser, updateUser)
     because an earlier implementation that cached writes caused a 4-hour data
     inconsistency incident in 2022-Q4. The architectural decision record ADR-041
     explicitly prohibits write-path caching in UserService.
  -> Three services — ReportingService, AnalyticsService, and AdminService — bypass
     the cache by calling the database directly through a read replica. This is
     documented as an exception in the service contract: reporting workloads require
     consistent reads that the 90-second TTL cannot guarantee.
  -> Changing the TTL requires review by the Database Reliability team because
     the current value was negotiated with the on-call rotation as part of the
     incident postmortem follow-up (Jira INFRA-2891).

  The text-only explanation describes what the code does.
  The semantic explanation explains why the code works the way it does.
  Only the second one is useful for making safe changes.

Kognita indexes the sources that produce architectural understanding, not just the code text. The service graph, the execution context, the Jira history, and the architectural decision records are part of the same queryable index. An engineer asking why a service is structured a particular way gets an answer grounded in the actual decision that produced the structure — not just a narration of the implementation that resulted from it.