Blog

Why AI Keeps Building What Already Exists in Your Codebase

10 min read

A senior developer opens a pull request for code review and finds a utility function that is almost identical to one that has lived in shared/utils/formatters.ts for a year. She asks the author. He says he used Cursor to write it — he described what he needed, Cursor generated it, it worked, he shipped it. He had no idea the other one existed. He has been on the team for eight months. She has seen this exact scenario three times this sprint alone.

This is the AI coding duplication problem, and it is becoming structural on teams that have adopted AI tools heavily. It is not a code quality failure or a process breakdown. It is a perception failure: the agent does not know what already exists in the system, so it builds what it needs locally, and the codebase grows two retry queues, three notification managers, and four email formatting utilities — each slightly different, each maintained independently, each quietly diverging.

Understanding why this happens, where it shows up most often, and what the actual cost is matters before you can fix it.

Why AI agents build duplicates

The core problem is retrieval scope. An AI coding agent in Cursor, Claude Code, or Copilot pulls context from what is visible: the open file, the nearby files in the editor, and whatever the IDE's local index surfaces when the agent runs a semantic search. That context is genuinely useful for writing code in the current file. It is not enough to answer the question "does something that does this already exist somewhere in the codebase?"

A production codebase with five services, a shared library, a monorepo root, and three years of accumulated utilities contains the answer — it is just not in the agent's context window. The agent cannot search shared/utils/ semantically unless something routes that search explicitly. Most IDE-level AI tools do not do this. They operate on the assumption that what the developer can see is what matters.

The second problem is session memory. An AI coding session has no persistent state between conversations. The retry handler the agent helped write last Tuesday is completely unknown to the agent in today's session. The agent does not accumulate a model of the system across sessions — every session starts cold. This means that five developers on a team each using Cursor independently are not building from a shared understanding of the system. They are each building from their own local window, which overlaps with the others in unpredictable and partial ways.

This is not how human engineers work. A developer who has been on a team for six months has accumulated mental context about what patterns exist. They have seen the retry handler, they remember the email utility, they know about the billing client wrapper in shared. That accumulated knowledge prevents duplication. AI agents have no equivalent of that knowledge unless it is explicitly provided to them.

The duplication anatomy: what AI tools duplicate most often

Not all code gets duplicated equally. Certain patterns get regenerated from scratch repeatedly because they are easy to describe locally without reference to existing implementations. The same description — "I need to retry a failed operation with backoff" — generates a new implementation in every session where the existing one is not in context.

What AI tools duplicate most often, and why

What AI tools duplicate most often, and why:

  Utility functions (formatters, parsers, validators)
    → easy to regenerate locally from a description
    → "format a currency amount" generates a new function, every time
    → result: formatCurrency, formatPrice, formatAmount, toCurrencyString

  HTTP client wrappers
    → "I need to call the billing API" → write a client
    → AI does not check if BillingApiClient already exists in shared/
    → result: three billing clients with different timeout configs

  Error handling / retry logic
    → each service builds its own pattern
    → no single retry strategy is propagated consistently
    → result: diverging backoff logic, different max attempt counts

  Notification and event emitter wrappers
    → "send a confirmation email" → build an email sender
    → EmailService, NotificationService, MailerUtil, sendEmail() all appear
    → result: four implementations, different template paths, different queues

  Adapter layers for external APIs
    → normalizing Stripe response? Build a normalizer.
    → normalizing it again in a different service? Build another one.
    → result: two StripeWebhookAdapter implementations with different field mappings

Utility functions are the most common case because they are the easiest to regenerate. A developer needs to format a currency amount. They describe it to the agent. The agent writes formatCurrency(amount, currency). The function works. It goes in. Nobody checks whether shared/utils/currency.ts already exports formatPrice(amount, currency, locale) with slightly richer behavior. Now there are two. A few sprints later, a third developer generates toCurrencyString(). The codebase has three currency formatters, none canonical.

HTTP client wrappers are the highest-stakes duplicate category. When a developer in the subscription service needs to call the billing API, the agent writes a client. When a developer in the webhook service needs the same call, the agent writes another client. The shared infrastructure team may have already built a BillingApiClient in packages/api-clients/ with auth handling, retry logic, and proper timeout configuration. Neither AI session knew about it. Both built something that works in isolation and diverges from the canonical client on the first configuration change.

A concrete example: three payment retry handlers

This is the kind of scenario that shows up in code review on teams that have been using AI tools for more than a few months. It is not invented. The service names change but the shape is consistent.

Three payment retry implementations in one codebase

Three payment retry implementations — found during a single code review:

  services/billing/src/handlers/charge.ts
    → retryChargeWithBackoff(customerId, amount, maxAttempts = 3)
    → exponential backoff, logs to billing_audit_log

  services/subscription/src/workers/renewal.ts
    → retrySubscriptionCharge(subscriptionId, attempts = 0)
    → fixed 5s delay, no audit log, stops at 5 attempts

  services/webhook/src/processors/stripe.ts
    → handleFailedPaymentRetry(event, retryCount)
    → immediate retry, writes to stripe_events, different failure logic

  All three were written by AI coding tools in separate sessions.
  All three handle subtly different edge cases.
  A bug fix in one will not be applied to the others.

Each of these was written by an AI agent in a separate session. Each made locally reasonable decisions. The billing handler got written when a developer was working on charge processing. The subscription renewal handler got written when a developer was working on the monthly renewal worker. The webhook handler got written when a developer was instrumenting Stripe event processing. Nobody was cutting corners. Nobody was duplicating intentionally. The agent simply did not know what the other services had already built.

The divergence is already there: different max attempt counts, different delay strategies, different audit logging behavior. A bug in the backoff logic — say, not capping the delay at a maximum interval — exists in all three, but when it gets fixed in the billing handler, the subscription and webhook handlers are not updated. Three months later, the same bug surfaces in production from the webhook path, and the debugging session has to trace back through the history to understand why two services behave differently under the same failure conditions.

This is not a discipline problem

The instinctive response is to say "tell developers to search before they build." It does not work at scale, and it is not fair to the developers.

Searching before building requires knowing where to search, knowing what to search for, and having a mental model of what might already exist. On a team of eight developers with three or more repositories, that mental model is distributed unevenly. A developer who works primarily in the subscription service does not have a reliable picture of what the billing service or the shared packages contain. They have not reviewed every PR that touched those services. They have not read every utility file in shared. They know their area well; the edges are fuzzy.

The agent compounds this. When a developer is in a flow state, working with an AI coding tool to implement a feature, the natural motion is to describe what you need and use what gets generated. Interrupting that flow to manually grep the full repository, check shared packages, and cross-reference with other services is friction that most developers will skip — not because they are lazy, but because the task is genuinely hard, the existing search tooling is limited, and the signal-to-noise ratio on a large codebase search is often poor.

Large software systems are already too complex for any individual to hold in their head. The solution cannot be "hold more of it in your head before using AI." The solution is giving the AI the system context it currently lacks, so it can answer the "does this already exist?" question before generating a duplicate.

The downstream cost of diverging implementations

Duplicate implementations are not just a code quality aesthetic problem. They create operational liability that compounds over time.

The first cost is bug propagation asymmetry. When a bug gets fixed in one implementation, the other implementations do not automatically receive the fix. The team may not know the other implementations exist, or may not connect them to the same underlying pattern. Six months later, the bug resurfaces from a different codepath and nobody recognizes it as the same bug they fixed before.

The second cost is behavioral divergence. Two implementations of the same pattern — say, email notifications — start from similar code but evolve differently over time. One gets updated to handle unsubscribed users. The other does not. One gets updated to log to the audit trail. The other continues writing to the legacy log format. The system now has two different behaviors for what should be the same operation, depending on which codepath triggers it. This kind of divergence is extremely hard to debug because the symptoms look like data inconsistency rather than implementation inconsistency.

The third cost is onboarding friction. New engineers onboarding into a codebase that has three notification patterns, two user models, and four currency formatters do not know which one to use. They ask a senior developer. The senior developer picks the one they are most familiar with. The new engineer adds to the proliferation without understanding that a canonical implementation should exist. The problem is self-reinforcing.

The fourth cost is refactoring cost. When the team eventually decides to standardize on a single implementation, the consolidation requires finding all call sites across all services, understanding the behavioral differences between implementations, migrating each caller without regressions, and deprecating the extras. That is a significant sprint investment for a problem that could have been avoided at the point of creation.

The team size multiplier

This problem scales with team size in a non-linear way. With one developer using an AI tool, duplication within a single session is unlikely — the agent sees the earlier code in the same context window. With two developers using AI tools independently, the duplication rate starts to climb. With five developers each with their own AI sessions across multiple services, the duplication becomes structural.

The reason is that five independent AI sessions produce five independent views of the codebase. Each view is coherent within itself. The divergence happens in the gaps between views — the parts of the codebase that one developer has worked in but the others have not reviewed recently. Those gaps are where duplicates appear, because each agent builds from what it can see and nothing more.

AI tools give different answers about the same codebase depending on what they have been shown. Five developers each priming their agents with different local context get five different views of what already exists. On any non-trivial feature, at least one of those views will be incomplete enough to generate a duplicate.

What changes with system-wide awareness

The fix is not procedural. Style guides do not help — a style guide tells you how to write code, not what already exists. Code review catches duplicates after the fact, when the cost of discarding the generated code has already been paid in developer time. Mandatory search steps add friction without guaranteeing coverage on large codebases.

The fix is giving the agent the system-wide awareness it currently lacks before it generates code. Specifically: a semantic index of the codebase that the agent can query before building, so it can find existing implementations that match what it is about to write.

This is different from the local search that IDE-level tools perform. A local search finds files that textually match a query. A semantic index understands behavioral intent — "something that retries a failed operation with backoff" matches retryChargeWithBackoff in the billing service even if the query came from the webhook service and the two files have no textual overlap. The agent can ask "does something like this exist?" and get a grounded answer before building.

Before and after grounded codebase awareness

Without codebase awareness (AI session with local context only):

  Developer: "Add retry logic for failed Stripe charges in the webhook handler."

  Agent: writes handleFailedPaymentRetry() from scratch
         adds exponential backoff implementation inline
         new function: ~80 lines
         duplicate of billing/src/handlers/charge.ts retryChargeWithBackoff()
         diverges immediately on first bug fix


With codebase awareness (agent grounded in semantic index):

  Developer: "Add retry logic for failed Stripe charges in the webhook handler."

  Agent: finds retryChargeWithBackoff() in services/billing/src/handlers/charge.ts
         notes it already handles exponential backoff and audit logging
         suggests: import and reuse retryChargeWithBackoff, pass webhookSource flag
         result: one canonical implementation extended, not duplicated

Kognita serves this as an MCP endpoint. Developers connect their AI coding tools — Claude Code, Cursor, Windsurf — to a cloud-hosted MCP server that holds the current semantic index of all connected repositories. When the agent needs to build a new pattern, it queries the index first. The query returns existing implementations, their locations, their current behavior, and how they are used across the codebase. The agent can then decide to extend the existing implementation rather than generate a new one.

This happens at the point of generation, not after. The duplicate never gets written. The divergence never starts. The downstream costs — bug propagation asymmetry, behavioral drift, onboarding confusion, consolidation effort — do not accumulate.

The index keeps pace with the team

One detail that matters for this to work in practice: the semantic index needs to be current. An index that was built three weeks ago does not know about the retry handler that was added last Thursday. If the agent queries a stale index, it will generate a duplicate of something that was built after the last reindex — which is exactly the kind of thing that gets missed.

Kognita re-indexes automatically on push, so the index available to each developer's AI session reflects what was in the repository as of the last push. A developer who built a canonical implementation on Monday is protected from a colleague who would otherwise duplicate it on Wednesday. The gap is the push-to-push interval, not the last-manual-index interval.

This also means the index is a team resource, not a per-developer setup. Every developer on the team queries the same index. Five developers with five AI sessions are all grounded in the same current view of the codebase. The independent context problem does not go away entirely, but the duplication surface shrinks dramatically when every agent shares the same indexed understanding of what already exists.

Final take

Duplicate implementations in AI-era codebases are not a code quality problem you can solve with style guides, code review checklists, or team communication norms. They are a perception problem. The agent does not know what already exists in parts of the codebase it cannot see. Telling humans to compensate for that gap does not scale beyond small teams with limited codebases.

The durable fix is giving agents system-wide awareness before they build. A semantic index that the agent queries at the start of a session — before writing a line of new code — is the difference between a codebase that accumulates one canonical implementation per pattern and one that accumulates four. The cost of the duplicate is not just the extra code. It is every bug fix that does not propagate, every behavior that silently diverges, and every engineer who cannot confidently answer "which one should I use?"