KognitaKognita.

Blog

When an Engineer Leaves, the Context They Carried Leaves With Them

11 min read

The senior backend engineer gave three weeks notice. The engineering manager made a list of things to document before the last day: the quirks of the payment webhook handler, the reason the search index rebuild is rate-limited to Sundays, the undocumented behavior of the legacy auth middleware. Three weeks later, the list was 40% documented. The engineer left. Within two months, three incidents traced back to the knowledge gaps they left behind.

This is not a failure of process or goodwill. The engineer tried to document what they knew. The manager tried to capture what mattered. Neither had a reliable way to inventory the implicit knowledge — the kind that feels obvious to the person who has it, invisible to everyone else, and critical only when something breaks.

The codebase stayed. The understanding of the codebase did not.

What engineers carry that is not written down

The most dangerous knowledge is the kind that never gets created as a document because it does not feel like a thing to document. It feels like context — background understanding that any reasonably experienced engineer on the team would just have. Except that most of it came from a specific incident, a specific architectural discussion, a specific moment when something broke and the post-mortem produced an unwritten rule.

There are four categories of this kind of knowledge, and each disappears in its own way when an engineer leaves.

Architecture decisions with context

The codebase has a structure that reflects decisions. The payments service is isolated. The subscription service is a separate deployment. The notification system uses a queue rather than calling services directly. Each of these has a reason. That reason is often not in the code, not in the PR description, not in any document — it is in the memory of the engineer who made the call or was in the room when it happened.

Without the context, the next engineer sees the architecture but not why it is that way. They make a change that seems reasonable given the visible structure and accidentally violates an invariant that was implicit in the original design. The cascade failure that the isolation was designed to prevent happens again.

Operational constraints

Never run a full table scan on user_events in production between 8am and 10am. The search index rebuild runs on Sundays because running it mid-week took the API down for 40 minutes. Batch-deleting from the sessions table on a production replica causes replication lag that takes 20 minutes to clear. None of these are in the codebase. None of them would be obvious to someone reading the code. They exist only in the memory of the engineers who were on-call when each of them was discovered.

These constraints are not edge cases. They are the actual operational reality of the system. When the engineer who learned them is gone, the next person learns them the same way the first person did: from an incident at 2am.

Historical failures

Every production system has scar tissue. Behaviors that look odd in isolation make sense when you know what broke the last time someone tried the obvious alternative. The payment webhook handler re-enqueues on any 4xx, not just 429 — because that was the workaround for a Stripe API quirk that caused payment failures in 2022. The auth middleware has a hardcoded 15-minute session extension for legacy mobile clients — because removing it once broke authentication for 12% of the user base for two hours.

The code carries the scar. The reason for the scar lives in the engineer's memory. When the engineer leaves, the next person reads the code, sees something strange, assumes it is wrong, and tries to clean it up. The incident happens again. The scar is re-learned from scratch.

Naming conventions and local standards

Most teams have conventions that are enforced informally. We always version background jobs before changing their payload schema. We never write to the main database from the reporting service. We do not add new columns to the users table without a migration review. These are real constraints with real reasoning behind them, maintained by team culture rather than automated tooling. When the engineer who enforced them leaves, the conventions start eroding within one sprint.

Why exit interviews and documentation sprints do not work

The standard response to this problem is the exit interview plus documentation sprint. Three weeks before the engineer's last day, the manager schedules a knowledge transfer session. The engineer produces a handoff document. Key workflows get walked through. Critical runbooks get written.

This works at the margin and fails at the core. The documentation sprint captures the knowledge the engineer knows to put on the list. It completely misses the knowledge that is not on the list — the things that feel so obvious they would not occur to anyone to write down. You do not think to document "never batch-delete from the sessions table on a production replica" because it seems obvious until the person who learned that lesson the hard way is gone.

The exit documentation plan vs. what actually gets captured
The exit documentation plan vs. reality:

The plan (three weeks before last day):
  ✓ Document payment webhook handler quirks
  ✓ Write up search index rebuild constraints
  ✓ Explain auth middleware legacy behavior
  ✓ Record a walkthrough of the billing service
  ✓ Write up on-call runbooks for the five most common incidents
  ✓ Explain the deployment process for the EU region (different from US)
  ✓ Document the Stripe API edge cases
  ✓ Explain the historical context for the subscription service architecture
  ... (22 more items)

What actually got documented (by last day):
  ✓ Payment webhook handler quirks (partial)
  ✓ Search index rebuild constraints
  ✗ Auth middleware legacy behavior ("I'll do it tomorrow" — last day)
  ✗ Most of the rest

The items that were not documented:
  -> The ones the engineer did not think to put on the list
     because they seemed obvious
  -> The ones that only surface when something breaks
  -> The operational knowledge that lived in muscle memory, not memory

The items that hurt most after the engineer leaves are not the ones on the list that did not get written. They are the ones that never made it onto the list — because the engineer did not know they were on the list.

AI tools make this worse, not better

There is a version of this problem that used to be manageable and has become significantly harder with AI tooling. Engineers are now productive much faster — weeks, not months, before they are making consequential changes to complex parts of the system. That acceleration is real, but it means context gaps are more dangerous than they used to be. The new hire who would have taken three months to get to the payment webhook handler now reaches it in three weeks, with enough leverage to change it, before they know why it is the way it is.

AI tools also carry context that disappears when the engineer does. A senior engineer with 18 months in the codebase has an AI setup that has accumulated significant operational context: the .clauderules file they wrote, the CLAUDE.md they built up, the session history that informed how the model learned to assist them. That setup represents months of careful calibration — the model has learned which queries are dangerous, which conventions are enforced, which architectural boundaries matter.

When that engineer leaves, their AI setup goes with them. The next engineer starts cold. Not just without the human's knowledge — without the AI context the human had built. The gap is compounded, not just continued.

The bus factor in AI-era teams

Bus factor — the number of engineers who could leave (or be hit by a bus) before critical knowledge becomes unrecoverable — has always been a risk in software teams. AI makes the risk calculation different in two directions at once.

On one side: AI tools make individuals dramatically more productive. One engineer with AI assistance can move at the pace that used to require two or three. Teams can operate with smaller headcount. That looks like efficiency. It is also a concentration of knowledge risk. When the productive engineer with the working AI setup leaves, the team does not just lose one person's output — they lose the system understanding that enabled that output, and they may not have redundancy for it.

On the other side: the AI setup itself carries context. A senior engineer's Cursor session, their .clauderules, their local CLAUDE.md, their history with the model — all of that is institutional knowledge stored in a personal tool. When they leave, so does that tooling layer. The replacement engineer is not just starting from the new-hire baseline. They are starting from below it, because they do not have the context that made the previous engineer's AI sessions useful.

What an experienced engineer's AI session knows vs. what a new engineer's session starts with
What an experienced engineer's AI session knows after 18 months:
  -> "Never run a full table scan on user_events in production between 8am–10am.
      Indexes degrade under write pressure during that window."
  -> "The search_index_rebuild job is rate-limited to Sundays because we took down
      the API for 40 minutes the first time we ran it mid-week."
  -> "payment_webhook_handler has a silent retry bug — it re-enqueues on any 4xx,
      not just 429. This was intentional to work around a Stripe API quirk in 2022."
  -> "auth_middleware has a hardcoded 15-minute session extension for legacy mobile
      clients that still use the v1 token format. Do not touch this."
  -> Team conventions, service ownership, deployment quirks, unwritten agreements.

What a new engineer's AI session starts with:
  -> Whatever is in the README
  -> Whatever is in the .clauderules file (if one exists and was committed)
  -> Whatever the model knows from training
  -> Nothing about your specific operational history

The gap between an experienced engineer's AI context and a new engineer's starting point is the same gap that has always existed in human knowledge transfer. AI just makes it newly visible — and newly expensive.

What persistent codebase context changes

The problem is not that engineers carry knowledge. It is that the knowledge they carry has no persistent home outside of their memory and their local tooling. If that understanding were managed and maintained as a team-level resource — indexed, queryable, and not belonging to any individual — departure events would not reset institutional knowledge to zero.

This is what Kognita does for engineering teams. The codebase index is not a personal tool. It is managed infrastructure, rebuilt automatically as the repository evolves, accessible to anyone on the team. The service boundaries, the architectural patterns, the naming conventions, the cross-repo dependencies — they live in a shared index rather than in one engineer's mental model or one engineer's AI session.

Knowledge categories: what leaves with the engineer vs. what stays in a managed index
What leaves with the engineer:
  -> Why the payments service is isolated from the rest of the transaction flow
     ("we had a cascade failure in 2023 that took down everything else — this was
      the architectural response")
  -> Which database queries are unsafe on the production replica under load
  -> Which external API integrations have quirks that took incidents to learn
     (Stripe's 4xx retry behavior, Twilio's rate limits on burst sends, the
      third-party KYC provider that silently 503s instead of returning errors)
  -> Which team conventions were never written down
     ("we always version background jobs before changing their payload schema")
  -> The reason specific architectural decisions were made
     ("subscription_service is a separate deployment because it was originally
      a vendor product we migrated away from — the boundary reflects that history")

What stays in the codebase:
  -> The code itself
  -> Commit messages (partial, often context-free)
  -> PR descriptions (if the team writes them)
  -> Comments in the code (often aspirational, not current)

What stays in a managed codebase index:
  -> Service structure, ownership signals, dependency maps
  -> Patterns, conventions, and naming standards as they exist in practice
  -> Cross-repo relationships and data flow
  -> A queryable representation of what the system does —
     available to the next engineer before they make their first mistake

When the senior engineer leaves, the patterns they established and the conventions they enforced are in the index. A new engineer can ask "why is the subscription service a separate deployment?" and get an answer grounded in what the codebase actually shows — the service boundary, the isolation pattern, the dependencies that would explain the separation. They cannot recover the original meeting where the decision was made. But they can understand the structural logic, which is what they need to work within it safely.

Practical steps for reducing knowledge risk

The right time to build system-grounded context is not during the notice period. Three weeks is not enough time to capture 18 months of accumulated understanding. The right time is continuously — so that when the departure happens, the index already reflects the system's current structure, the conventions in practice, the service ownership signals, the architectural decisions as they are actually expressed in the code.

The most valuable immediate action for any engineering manager is understanding where knowledge concentration is highest on the team. Which engineer, if they left tomorrow, would produce the most incidents in the following 90 days? That is the bus factor risk. For that engineer and those systems, the question is not how to document what they know — it is how to make their knowledge queryable by others before they leave.

Pair programming, code reviews, and architecture discussions all help at the margin. They distribute knowledge through humans, which is slow and partial. A managed codebase index distributes knowledge through a system that does not depend on any one person's availability, does not get stale, and does not require a scheduled meeting to access.

The exit interview that actually works

An exit documentation sprint produces a document. That document is correct on the day it is written and increasingly wrong every day after. Systems change. The document does not. Within six months, the most important parts of the handoff doc are already misleading.

A managed codebase index reflects the system as it actually exists today, re-indexed automatically as the code changes. The new engineer does not inherit a document — they inherit a queryable model of a living system. That is the exit interview that actually works at scale.

Final take

The real bus factor risk is not losing the engineer. It is losing the context the engineer accumulated. Every month they are on the team, that context grows — in depth, in specificity, in operational detail that cannot be reconstructed from the code alone. Every day after they leave, it erodes. The people who remain know less about the system than the team did before the departure, and the gap grows with every change made without that context.

A managed codebase intelligence layer is the only way to capture that understanding systematically — not in any one person's head, not in a document that will be outdated by next quarter, but in a continuously maintained index that the whole team can query. When the senior engineer leaves, the context they built does not leave with them. It stays in the system, available to whoever needs it next.