Blog
Non-Technical Teams Can't Verify What AI Actually Built
9 min read
"Staging looks fine, but I can't tell if what shipped actually matches what I wrote." This is the quiet frustration of every product manager working at AI-accelerated velocity. The feature is in staging. The PR is merged. The sprint is closed. And there is no practical way to verify that the implementation matches the specification — not without asking an engineer to spend thirty minutes walking through the code, which nobody has time for at the end of a sprint.
This is a requirements traceability problem, and AI-accelerated development has made it significantly worse. The old model had natural checkpoints that created traceability almost as a side effect. The new model removes those checkpoints without replacing the verification they provided.
The old traceability model and why it worked
Before AI coding tools, the path from a Jira ticket to a merged feature typically took days. A PM wrote a spec. An engineer read it, asked clarifying questions in the ticket or in a meeting, built a design, got feedback, then implemented. The implementation took long enough that a PM could check in mid-sprint. The PR included a description of what was built, often written to match against the acceptance criteria. Staging reflected a manageable number of changes per week.
This process was slow — often frustratingly so. But it was legible. The PM could track the feature from spec to code through human conversations that created a paper trail. Misalignments between "what was specified" and "what was built" surfaced during design reviews, sprint reviews, or at least in staging before production. When something was wrong, there was a clear moment where it could be caught.
The traceability was not by design. It was a byproduct of the pace. Slow development meant more touchpoints, and more touchpoints meant more chances for a non-technical team member to verify alignment. The velocity was the cost. The legibility was the benefit that came with it.
How AI velocity breaks requirements traceability
AI coding tools compress the implementation timeline without compressing the verification timeline. An engineer who previously spent three days implementing a feature now ships a PR in four hours. The spec-to-code gap that used to accommodate design reviews and stakeholder check-ins has collapsed.
The natural traceability checkpoints disappear along with the time. There is no design discussion because there is no design phase — the AI generates the structure directly from the requirement. There is no mid-sprint check-in because the feature is done before the PM expected to hear about it. The PR description may be auto-generated or minimal, because the engineer was moving fast and the AI already produced the code.
Staging now reflects ten changes at once instead of three. The PM reviewing staging sees the feature and it appears to work — the happy path functions as expected. But the acceptance criteria had four conditions, and staging only exercises two of them. The edge cases — what happens to legacy users, what happens when the operation fails partway through, what happens when two conditions are true simultaneously — are not visible in a staging walkthrough.
The problem is not that engineers are shipping things wrong. It is that the verification infrastructure has not kept up with the velocity. The process still assumes the PM will catch misalignments through the human touchpoints that used to exist. Those touchpoints are gone. The PM is left with staging and a closed Jira ticket — neither of which tells them what the implementation actually does relative to what the spec required.
The non-technical team's verification problem
Non-technical team members have always depended on engineers to translate "what was built" into plain language. This is not a new dependency. It existed before AI coding tools and it is a reasonable division of labor. The engineer understands the implementation; the PM understands the requirement; they talk to verify alignment.
At AI velocity, that translation bottleneck is worse, not better. There is simply more to explain per sprint. Where an engineer previously needed to walk a PM through three features after two weeks of work, they now need to walk them through eight — all of which were generated and iterated quickly, without the PM being in the loop at any point during implementation.
This problem starts upstream. Product managers writing specs without codebase access frequently write requirements that are ambiguous at the implementation level — not because the PM is unclear, but because "send a confirmation email immediately" means something different in a synchronous request handler versus an asynchronous event queue. The PM did not know the system had an event queue. The engineer assumed "immediately" could mean "on the next batch cycle." The AI generated code consistent with the engineer's assumption. The PM's requirement and the implementation diverged without anyone noticing.
The result is a class of bugs and compliance gaps that are not caught until production. Not because anyone was negligent — but because the verification step that would have caught them (the conversation between PM and engineer about the implementation) no longer reliably happens at the pace AI development runs.
Three realistic examples of the traceability gap
The traceability gap is not theoretical. It produces specific categories of failures: permission logic that does not cover all users, timing requirements that were interpreted differently by the AI than specified, compliance obligations that were partially implemented but not verified.
The traceability gap — three realistic examples:
Example 1: User permission model
Requirement (Jira ticket PM-441):
"Users with the 'viewer' role should not be able to export data."
Code shipped (PR #1203):
ExportController checks permissions via FeatureFlags.canExport(user)
FeatureFlags reads from a separate config table, not the roles table
Viewer-role users with a legacy 'export_beta' flag still export data
PM discovers: in production, when a customer reports it three weeks later.
Example 2: Notification timing
Requirement (Jira ticket PM-512):
"Send a confirmation email immediately after subscription activation."
Code shipped (PR #1347):
SubscriptionService emits an event to the notification queue
Notification queue processes on a 15-minute batch cycle
"Immediately" was never defined; AI inferred "eventually"
PM discovers: in production, when sales reports customer confusion.
Example 3: Data retention
Requirement (Jira ticket PM-588):
"Deleted accounts should have their data removed within 24 hours."
(Legal requirement; compliance-critical)
Code shipped (PR #1401):
AccountDeletionJob marks records as deleted (soft delete)
Hard deletion job runs weekly, not daily
24-hour requirement is not met
PM discovers: during a compliance audit, six months after the sprint.Each of these failures has the same structure: the engineer shipped what they understood the requirement to mean; the AI generated code that implemented that understanding; the PM never had a chance to verify that the understanding was correct. The failure mode is not a bug in the traditional sense. It is a specification mismatch that the process did not surface.
What product teams actually need to verify what was built
Verification does not require reading code. What it requires is the ability to ask questions about the implementation in plain language and get accurate answers. "Does this implementation handle the legacy user case that was called out in the acceptance criteria?" is a reasonable question. So is "What latency does 'immediately' translate to in this implementation?" and "Was the compliance requirement in PM-588 fully addressed, or only partially?"
These questions currently require an engineer to trace through the codebase manually — a 20-to-40-minute task per ticket that nobody has budget for at sprint review. The result is that most of these questions do not get asked, and the misalignments go undiscovered until they surface in production or in a compliance audit.
What product teams need is a layer that sits between the Jira tickets and the codebase and answers these questions automatically. Not access to the code — giving PMs direct codebase access does not solve the problem because the problem is not access, it is translation. What they need is a system that has already read both the ticket and the implementation and can report on the gap between them in plain language.
Questions a PM needs answered after sprint completion — that engineering cannot quickly answer:
Behavioral questions (does it work as specified?):
-> Does the new export restriction apply to all viewer-role users, including those
created before the role system was introduced?
-> When the spec said "immediately," what latency did the implementation deliver?
-> Which edge cases in the acceptance criteria were explicitly handled vs. implicitly assumed?
Coverage questions (was everything built?):
-> The ticket had three acceptance criteria. Were all three addressed in the PR?
-> The original ticket was split into two PRs. Do both together fulfill the spec?
-> A related ticket (PM-443) had a dependency on this one. Was that dependency resolved?
Regression questions (did this break anything?):
-> The notification change in PR #1347 — does it affect the existing email digest flow?
-> The permission check refactor — are there other places in the codebase that bypass it?
-> The data retention job — what other jobs run on the same schedule that might conflict?
These questions are reasonable. Engineering cannot answer them quickly because
the answers require cross-referencing the spec, the PR, and the live codebase —
a task that takes 20–40 minutes per ticket when done manually.None of these questions require a PM to understand the code. They require someone — or something — to have read both the spec and the code and to be able to report on their relationship. This is not a human task that can be done at AI velocity. It is an infrastructure problem.
How Kognita + Jira gives non-technical teams plain-language verification
Kognita connects Jira tickets to the live codebase and answers the verification questions that product teams currently cannot ask at velocity. The integration maintains a semantic understanding of what each ticket specified and what the codebase implementation delivers — and surfaces the gap between them in plain language, without requiring a PM to read any code or an engineer to spend thirty minutes translating.
The gap identification is the core value. Not just "here is what the code does" — but "here is what the ticket specified, here is what the code does, and here is where they diverge." That delta is what enables verification. A PM who can see that PM-588's 24-hour hard deletion requirement is not met by the current implementation can act before the compliance audit. A PM who sees that the export restriction in PM-441 has a legacy-user exception can raise it before a customer reports it.
What Kognita's Jira + codebase integration surfaces for a PM:
Query: "Did PR #1203 implement the export restriction from PM-441?"
Kognita response:
Ticket PM-441 specified: viewer-role users cannot export data.
PR #1203 modified: ExportController, FeatureFlags service.
Implementation: ExportController.canExport() delegates to FeatureFlags.canExport(user).
FeatureFlags reads from: feature_flags config table (not roles table).
Gap identified: Users with legacy 'export_beta' feature flag bypass role check.
Coverage: Acceptance criterion 1 partially met. Criteria 2 and 3 not addressed in this PR.
Query: "What did we actually ship in the data retention sprint?"
Kognita response:
PM-588 required: hard deletion within 24 hours.
AccountDeletionJob: marks records deleted_at (soft delete only).
HardDeletionJob: runs on weekly cron (not daily).
Delta: 24-hour requirement is not currently met by the implementation.
Tickets touching deletion logic this sprint: PM-588, PM-591, PM-603.
PM-603 (hard deletion scheduler) status: not yet merged.
Plain language. No code reading required. Linked to the tickets that specified it.The Jira MCP integration means this happens without changing how engineers work. The engineering team continues to ship at AI velocity. Tickets get linked to implementations automatically as PRs are merged. The semantic index re-indexes on each change, so the answers are always based on what is currently in the codebase, not on what was in it last week. Product teams get verification capability without creating a new review gate that would negate the velocity gains.
Final take
AI-accelerated development is not going to slow down to accommodate the old verification model. The old model depended on pace as a side effect — slow development created the touchpoints that created traceability. That pace is gone. The traceability infrastructure needs to be rebuilt deliberately, not recovered by asking engineers to slow down.
Non-technical team members are not asking for access to the codebase. They are asking for a reliable answer to a simple question: did what shipped match what we specified? At AI velocity, answering that question requires infrastructure, not conversation. The teams that solve requirements traceability at AI development speed will ship faster and catch fewer misalignments in production. The teams that rely on staging walkthroughs and sprint review conversations will continue to discover gaps three weeks after the sprint closed.