Blog

When AI-Generated Code Breaks in Production, Who Is Accountable?

9 min read

When AI-generated code ships to production and causes an incident, the accountability question is genuinely unclear. The developer is on the commit but may not have read the AI output carefully. The agent produced the code but has no legal standing. The tool provider disclaims liability in the terms of service. The organization deployed the tool but did not write the code. Nobody has clear accountability, and nobody has the evidence to establish it — because per-developer AI tools leave no organizational audit trail. The risk belongs to the organization by default, whether or not the organization has any visibility into what happened.

The accountability chain breaks with AI-generated code

Traditional code governance works because the accountability chain is clear: the developer wrote it, the reviewer approved it, both names are on the PR. When AI generates code, that chain becomes diffuse:

Accountability gap in AI-generated code

AI-generated code: who is accountable when it breaks?

  Human-written code breaks in production:
    → Developer who wrote it is on the PR
    → Reviewer who approved it is on the PR
    → Clear accountability chain

  AI-generated code breaks in production:
    → Developer accepted it (but may not have reviewed it deeply)
    → Agent wrote it (but agents are not employees)
    → Model generated it (but model providers disclaim liability)
    → Org deployed the tool (but did not author the code)

  Accountability: diffuse, contested, and often dropped

The accountability gap is not theoretical. It shows up in every post-incident review where AI tooling was involved: nobody can say with certainty what the agent was given as context, what alternatives it considered, or whether the developer who accepted the output actually understood it. This is the governance failure described from a speed perspective in AI coding: controlled use vs. polite chaos. Speed without proof means accountability without evidence.

Investigating an AI coding incident without audit trail

When an incident occurs and AI tooling is suspected, the investigation runs into immediate evidence gaps when per-developer tools were involved:

Incident investigation without managed runtime audit log

Investigating an AI coding incident without audit trail:
  "Our AI agent wrote code that deleted user data"

  Questions you cannot answer:
  → Which agent session generated that code?
  → What codebase context did it read?
  → Which model version was running?
  → Did a developer review it before merge?
  → When did the agent produce this — before or after the DB schema change?
  → Were there other sessions in the same area?

  Result: investigation takes weeks, root cause is "we don't know"

These are not difficult questions — they are the first questions any root cause analysis asks. Without an audit log, they cannot be answered. The investigation becomes a reconstruction from memory: who remembers which session, who thinks they know what context the agent had, who was working on what that week. Memory-based incident investigations take longer, produce less reliable conclusions, and make the accountability question harder to resolve, not easier.

What an audit log changes in an investigation

The same investigation with a managed runtime audit log looks completely different:

The same investigation with managed runtime audit log

Investigating an AI coding incident with managed runtime audit log:
  "Our AI agent wrote code that deleted user data"

  Queries you can run immediately:
  → Session ID: identified in 30 seconds
  → Context read: 4 files, 2 Jira tickets
  → Model version: claude-sonnet-4-6 (not the latest)
  → Developer review: merged without PR comment → likely rubber-stamped
  → Timing: 2 hours after schema change in migrations/
  → Related sessions: 3 other sessions in same service

  Result: root cause identified in 2 hours

The difference is not just speed — it is the ability to establish actual accountability rather than approximate blame. Was the code generated before or after a relevant schema change? Was there a review or was it rubber-stamped? Which model version was running? These answers determine whether the incident was a tooling failure, a process failure, or an honest mistake under time pressure. Without the log, all three possibilities are equally defensible and none can be proven.

Risk ownership: per-developer tools vs. managed runtime

The risk ownership model for AI coding tools has a counterintuitive property: per-developer tools create the illusion of individual accountability while concentrating undocumented risk in the organization. Managed runtime makes the organization's ownership explicit — but also provides the evidence that makes that ownership defensible:

Risk ownership comparison

AI coding risk ownership model:
  Per-developer tools:
    Risk owner:     nominally the developer
    Evidence:       none (private session)
    Accountability: who is on the commit (not who instructed the agent)
    Liability:      the organization regardless

  Managed runtime:
    Risk owner:     the organization (explicitly, by design)
    Evidence:       audit log, session records, context log
    Accountability: developer who approved + org that provisioned
    Liability:      defensible because evidence exists

Kognita's managed runtime logs every agent session — what was read, what was produced, which model was used, who authorized the session. This is not surveillance; it is the audit infrastructure that lets an organization say "we deployed AI tools responsibly, here is the evidence" rather than "we deployed AI tools and we do not know what they did."

The legal and compliance dimension

For organizations in regulated industries — financial services, healthcare, legal tech — the accountability question is not philosophical. Regulators who ask "what AI systems are you using in production?" expect an answer that includes evidence of governance, not just a list of tool names. The ability to produce an audit trail of AI-generated code decisions is increasingly a compliance requirement, not just a best practice.

Final take

When AI-generated code causes an incident, the organization owns the risk whether or not it has any visibility into what happened. The choice is between owning the risk with evidence — audit log, session records, context log — and owning it without.

Accountability without evidence is just blame. Managed runtime provides the evidence that turns "something went wrong with AI" into "here is what happened, here is who approved it, and here is what we are changing."