Blog

The AI Tools Your Dev Team Uses Every Day — And Why You Can't See What They Build

9 min read

Your engineering team runs AI tools every single day. GitHub Copilot is completing code as they type. Cursor's agent mode is executing multi-file implementations autonomously. Claude Code is running in terminal sessions, planning and executing changes across multiple services. Some teams have gone further — autonomous agents like Cline or Devin are opening pull requests and making architectural decisions without a human in the loop at all. Every one of these tools is running between sprint demos. None of them surface what they produced to the product side of the organization.

This is not a communication problem. It is not a process problem that a better standup format will solve. It is a structural visibility gap: the tools your engineering team uses are designed for engineers, produce output that lives in the codebase and in pull requests, and have no mechanism for translating what they built into a form that a Product Owner or Scrum Master can evaluate without reading code. You find out what they built at the sprint demo, three minutes per feature, two weeks after the work happened.

GitHub Copilot: the invisible layer

GitHub Copilot is the tool your engineering team adopted first and uses most constantly — and it is the most invisible from the product side. Copilot operates at the keystroke level, suggesting completions as engineers type. It shapes implementations in ways that are individually small and collectively significant. An engineer accepts a data model suggestion. A function signature gets completed in a way that diverges slightly from the acceptance criteria. A test structure is suggested that covers some paths but not others. None of this is logged. None of it is surfaced. It accumulates in the codebase as choices that felt like the engineer's choices but were shaped by the model's suggestions.

For a Product Owner trying to verify that an implementation matches a spec, Copilot is effectively invisible. The output is just code — code that, from the outside, looks identical to code written without AI assistance. The decisions Copilot contributed to are not annotated. The places where the model's suggestion diverged from the spec are not flagged. The acceptance criteria are not checked at completion time. The Product Owner finds out at the sprint demo.

Cursor: the AI editor with an agent mode

Cursor is the next generation: an AI-first editor that understands the full codebase context, not just the current file. Engineers use it for chat-driven edits, multi-file refactors, and — increasingly — agent mode, where they give Cursor a goal and let it plan and execute a sequence of changes across files, directories, and sometimes multiple services. For a senior engineer, Cursor agent mode is how you implement a well-scoped feature in hours instead of days. For a Product Owner, Cursor agent mode is a black box that ran while you were in a roadmap meeting.

Cursor does show the engineer what it did — there's a log of the steps the agent took, the files it modified, the decisions it made. That observability exists inside the tool, for the person running it. It does not flow anywhere else. There is no Cursor dashboard for Product Owners. There is no notification that goes to the Scrum Master when an agent modifies three services outside the ticket scope. The output is a merged PR, and the PR description says what the engineer chose to write, not what the agent decided.

Claude Code: the terminal agent

Claude Code operates in the terminal and takes a more explicitly agentic approach than Cursor. Given a goal, it will plan a multi-step sequence, read relevant files, write code, run tests, handle errors, and iterate until the task is done or it gets stuck. Engineering leads use it for substantial work: feature implementations, test suite generation, service-to-service integrations, refactors of non-trivial complexity. Engineering teams that have adopted these tools are shipping at rates that product planning processes were not designed for.

From the product side, Claude Code is completely invisible during the work. The agent's planning is in the terminal session. Its decision-making — which files to touch, how to interpret the spec, what edge cases to handle or ignore — is not surfaced in any product-facing format. What appears in the codebase is the result. The reasoning is gone. The Product Owner's first exposure to what Claude Code built is the sprint demo, after the code is already deployed.

Autonomous agents: the output your team might not fully see either

Some engineering teams have gone further. Autonomous agents — Cline, aider, Devin, and others — can operate without a human in the loop for extended periods. They open pull requests, modify multiple services, and make decisions that would previously have required an engineer's judgment. The pull request is the first artifact that surfaces outside the agent's context. For a Scrum Master reviewing sprint progress, a PR from an autonomous agent looks the same as a PR from an engineer — except the agent may have touched files well outside the ticket scope, made architectural decisions nobody explicitly authorized, and created dependencies that will affect the next three sprints.

This is not an argument against autonomous agents. It is an observation about what they produce and where that output lands. Engineering has tool-level observability into what agents do — or at least, they have access to the logs if they choose to read them. Product has demo-level observability: every two weeks, three minutes per feature, after everything is already in production. The verification layer has not kept up with the generation layer.

The AI tool stack your dev team is running — and who can see what each tool produces

The AI tool stack your dev team is running — and who can see what each tool produces:

  GitHub Copilot
  -> What it does: inline code suggestions as the engineer types, tab-to-complete
  -> Who uses it: individual engineers, inside their editor
  -> What product can see: nothing — completions are invisible outside the editor session

  Cursor
  -> What it does: AI-first editor with chat, inline edits, and agent mode for multi-step tasks
  -> Who uses it: engineers, often their primary editor
  -> What product can see: nothing — Cursor's output is the code in the repo, not a log of what it did

  Claude Code
  -> What it does: terminal-based agentic AI — plans and executes multi-file changes, runs tests, iterates
  -> Who uses it: senior engineers and tech leads for complex, multi-service work
  -> What product can see: nothing — the agent's decisions are not surfaced outside the terminal

  Cline / aider / Devin (autonomous agents)
  -> What they do: open PRs, modify multiple services, make architectural decisions autonomously
  -> Who uses them: engineering teams running unattended background agents
  -> What product can see: a PR appears — no context on what the agent decided or why

The structural gap: why better communication doesn't fix this

The natural response when a Product Owner says "I don't know what the team is building" is a process intervention: more standups, a mid-sprint check-in, a Slack channel where engineers post updates. These interventions have a ceiling. They require engineers to translate what AI tools produced into product-legible language — which takes time, introduces interpretation, and still arrives filtered through the engineer's judgment about what matters.

The real problem is that product-facing observability into AI tool output does not exist at the tool level. Copilot, Cursor, Claude Code, and autonomous agents are all designed with the engineer as the audience. Their outputs — completions, diffs, agent logs, PR descriptions — are all in a format that requires technical literacy to interpret. A Product Owner who cannot read a Git diff cannot evaluate what Cursor's agent mode produced. A Scrum Master who doesn't know what a service boundary is cannot assess whether Claude Code stayed in scope. The information is in the codebase. It is not accessible.

What happens between sprint start and sprint demo — invisible to the product side

What happens between sprint start and sprint demo — invisible to the product side:

  Day 1 — Sprint kickoff
  -> Engineer opens PROJ-512 in Cursor, starts implementation
  -> Copilot autocompletes database schema — slightly different from what was specified
  -> Engineer accepts it; it's close enough and will "work fine"

  Day 2-3 — Agent work
  -> Claude Code is given PROJ-514 (related feature)
  -> Agent reads PROJ-512 implementation and builds on it, matching the schema Copilot suggested
  -> Three files modified, one new service created, tests written and passing

  Day 4 — Scope expansion
  -> Cline opens a PR for a refactor that wasn't ticketed
  -> Engineering lead approves; it's clearly the right call technically
  -> Product owner is not notified — the PR isn't in Jira

  Day 5-9 — More tickets close
  -> Sprint burns down faster than expected
  -> Two more tickets build on the Day 2-3 implementation
  -> A behavior difference from the original acceptance criteria propagates to four files

  Day 10 — Sprint demo
  -> Product owner sees the feature for the first time
  -> It mostly works but the data model is subtly different from what was specified
  -> Rework estimate: 3 days, touching 6 files across 2 services

The scenario in that breakdown is not a failure of communication. Nobody is hiding anything. The engineer made reasonable decisions. The agent operated within its instructions. The Scrum Master ran the ceremonies correctly. The Product Owner got the sprint demo they were supposed to get. The visibility gap is structural — it is built into the tools, the formats, and the cadence of how engineering teams operate with AI.

What product-side visibility into AI tool output actually looks like

Closing the structural visibility gap does not require engineering to file more tickets, write better PR descriptions, or run more demos. It requires giving Product Owners and Scrum Masters a direct query layer over what the AI tools actually produced — in plain language, before the sprint demo, connected to the Jira tickets that were supposed to guide the work.

This is what Kognita provides. A Product Owner who can ask "what changed in the checkout service this sprint?" on a Thursday afternoon — without a GitHub login, without asking a developer, without waiting for the demo — is operating with a fundamentally different relationship to what the engineering team's AI tools are building. They are not waiting to find out. They are checking, on their own cadence, in their own language, against the acceptance criteria they wrote.

What a product owner can ask Kognita about what the dev team's AI tools built this sprint

What a product owner can ask Kognita about what the dev team's AI tools built this sprint:

  "What changed in PROJ-512's implementation compared to the acceptance criteria I wrote?"
  -> Plain-language diff between spec and implementation — no GitHub required

  "Which files were modified this sprint that weren't mentioned in any Jira ticket?"
  -> Surfaces agent-driven scope expansion — visible before the sprint demo

  "What did the new authentication service added this sprint actually do?"
  -> Plain-language summary of what an agent built from scratch

  "Were there any PRs merged this sprint that don't map to an open Jira ticket?"
  -> The audit trail product needs but can't get from engineering's tools

  "What decisions did the team make about the checkout data model this sprint?"
  -> Reconstructs technical decisions in product language — no developer needed

The questions in that list are not technical questions. They are product ownership questions. "What changed compared to the acceptance criteria I wrote?" is the question every Product Owner should be able to answer before clicking accept in Jira. "Which PRs merged this sprint don't map to a Jira ticket?" is the audit that catches autonomous agent scope expansion before it becomes a retrospective argument. These questions are answerable — they just require a query layer that translates codebase state into product language. Non-technical teams running AI-assisted engineering without that layer are rubber-stamping work they can't evaluate.

Final take

GitHub Copilot, Cursor, Claude Code, and autonomous agents are running on your engineering team's machines right now. They are producing output at rates that human-paced verification processes were not designed for. That output lives in the codebase. It surfaces to the product side at the sprint demo, two weeks after it was built, three minutes at a time.

This is not a failure of your engineering team. It is not a process failure your Scrum Master can fix by scheduling another touchpoint. It is a structural gap between where AI tool output lands — the codebase, the PR — and where product-side stakeholders live — Jira, the demo, the roadmap. The gap is real, and it grows every sprint that the tools get more capable without the visibility layer keeping pace.

Your engineering team has tool-level observability into what their AI tools do. Your product team has demo-level observability — every two weeks, three minutes per feature, after the code is already deployed. That asymmetry is what rubber-stamp acceptance looks like from the inside, and it will not fix itself.