KognitaKognita.

Blog

How Do You Plan a Sprint When Half Your Team Is AI Agents?

9 min read

Scrum.org published a guide this year that would have sounded like satire two years ago: during sprint planning, the Product Owner and Developers must slice the Product Backlog items into two distinct categories — work suitable for humans and work suitable for AI agents. That sentence is now in a published Scrum framework document. Most scrum masters are still running planning exactly the way they did in 2022.

The problem is not that teams are ignoring this guidance because they disagree with it. The problem is that there is no practical process for doing it. How does a scrum master, in a ninety-minute planning session, look at fourteen backlog tickets and confidently split them between humans and agents? What information would they need? Where does it come from? The guidance says to split. It does not say how to know which tickets are safe for agents and which ones will end in a three-service regression if an agent touches them without system context.

Sprint planning was built for a homogeneous team

Scrum's capacity planning model has always assumed a pool of workers who are functionally interchangeable for planning purposes — roughly the same throughput, the same working hours, the same judgment about when to ask questions. Story points worked because human effort was relatively consistent across people and sprints. You could look at six engineers averaging forty points of capacity and commit to forty points with reasonable confidence.

That assumption is broken when half the team is AI agents. Agents have unlimited parallel capacity when the work is amenable and near-zero useful capacity when the work requires judgment about the existing system. A sprint that commits forty points of agent work on coordination-heavy tickets will produce ten points of merged code and thirty points of open PRs waiting for human review. The math doesn't hold because agents don't operate within the same capacity model that points were designed to predict.

What sprint planning assumed about the team — and what it gets now
What sprint planning assumed about the workers — and what it gets now:

  SPRINT PLANNING 2022
  Team composition: 6 engineers, all human
  Capacity unit: engineer-days, approximated via story points
  Parallel work: bounded by headcount
  Throughput ceiling: ~40 points / sprint (6 engineers × ~7 pts each)
  Dependency risk: linear — one person blocks one other person
  Scope drift: slow — human velocity limits how far off-script you go

  SPRINT PLANNING 2026 (same team on paper)
  Team composition: 6 engineers + 4 AI coding agents running in parallel
  Capacity unit: still story points (unchanged — this is the problem)
  Parallel work: unbounded — agents don't queue behind each other
  Throughput ceiling: undefined — agents can do 10x points if the
    work is amenable, 0.5x if the work requires judgment
  Dependency risk: nonlinear — one agent touches 6 services at once
  Scope drift: fast — agents complete their assigned tickets and
    continue into adjacent work without explicit permission to stop

  THE STRUCTURAL MISMATCH
  Sprint planning was designed for a homogeneous team.
  It has never been designed for a mixed team where half the workers
  have unlimited parallel capacity, no off-hours, and no judgment
  about when to stop and ask a question.

Scrum.org has started naming this directly: "Token budget planning is the new capacity planning for AI-augmented teams." The unit of scarcity is no longer engineer-hours — it is human review bandwidth. Agents can generate code faster than humans can safely review it. A team that commits twenty agent tasks but only has bandwidth to review eight PRs per week is not doing capacity planning. It is queuing rework.

Story points were never designed for agents

The calibration failure runs deeper than throughput. Story points encode human effort — the cognitive work of reading unfamiliar code, making implementation decisions, writing tests, handling edge cases. When an agent handles the implementation, the point estimate becomes meaningless in a different direction: it stops predicting time, but it also stops predicting risk. A five-point ticket assigned to an agent that touches three shared services creates more review and coordination overhead than a five-point ticket that is fully self-contained. The points don't distinguish between them.

Scrum.org put it bluntly: "Story points were not designed for AI agents. Teams must transition to measuring agentic capacity: how much parallel compute AI bots can utilize safely, measured against the human team's ability to review and merge generated code." And then the warning that every engineering manager with a quarterly velocity chart should read: "If a team's velocity jumps from 50 to 5,000 in one sprint after adopting AI tools, they haven't delivered 100x value — they've broken the metric."

The metric was already a proxy. AI-assisted development made velocity irrelevant as a capacity signal long before the planning ceremony caught up with the problem. What scrum masters need is not a new points system — it is a way to understand what the backlog actually contains before the sprint starts.

What the human/agent split actually requires

To split a backlog between human and agent work in planning, the scrum master and product owner need to answer questions that have nothing to do with story points. Which tickets are fully self-contained — touching a single service, with no open PRs in that area, no cross-team dependencies? Which tickets require understanding production failure modes, existing system behavior, or ongoing work from another team? Which tickets look simple but land in areas currently in flux?

These are system-level questions. The answers are in the codebase and in Jira. They are not in the heads of the engineers in the planning room — at least not reliably, for every ticket, for every service, across all twelve proposed items in a single session. The engineering lead knows the services they touched last sprint. They have an opinion about the services their teammates touched. What they reliably do not know is which of the proposed tickets will create a collision with work in another team's sprint, or which of the "simple" tickets hides a dependency that will stall the agent mid-task.

How to split a backlog between human work and agent work
How to split a backlog between human work and agent work:

  AGENT-SUITABLE WORK (assign to agents)
  Characteristics:
  -> Bounded scope with clear acceptance criteria
  -> Operates within a single service or module
  -> Has no dependency on work not yet merged
  -> Can be verified without deep system knowledge
  -> Failure mode is a test failure, not a data corruption

  Examples from a real sprint backlog:
  -> FEAT-441: Add pagination to /api/users endpoint
  -> FEAT-447: Write unit tests for PaymentService.calculateTax()
  -> FEAT-452: Update OpenAPI spec to match current response shapes
  -> FEAT-459: Migrate deprecated logger calls to new LogService API

  HUMAN-SUITABLE WORK (assign to engineers)
  Characteristics:
  -> Requires understanding cross-service behavior
  -> Has ambiguous acceptance criteria needing interpretation
  -> Involves a database migration or schema change
  -> Depends on another team's in-flight work
  -> Affects shared infrastructure others will modify this sprint

  Examples from the same backlog:
  -> FEAT-438: Refactor authentication middleware (3 services depend on it)
  -> FEAT-443: Add retry logic to payment webhook handler
      (requires understanding current failure modes in production)
  -> FEAT-449: Migrate user sessions from Redis to PostgreSQL
  -> FEAT-456: Coordinate API contract change with Team B before shipping

  HIDDEN DEPENDENCY RISK
  Without system grounding, this ticket looks agent-suitable:
  -> FEAT-445: Add currency field to checkout form
  Kognita surfaces: CheckoutService and PaymentService both need
  updating; PaymentService is being modified in PLAT-339 (another
  team's sprint, not yet merged). Assign to human, not agent.

The split only works if it is grounded in real system state. A product owner who is still sizing tickets as "three points" based on historical human effort cannot make the agent/human call from that data. Sprint planning built without codebase grounding creates exactly the risk that the human/agent split is meant to prevent: agents assigned to work that requires system judgment they don't have.

The agent goes off-script when the backlog runs dry

There is a second problem that the human/agent split addresses, and it is less discussed than the collision risk. Agents don't stop when their tickets are done. They continue. They find the next obvious thing in the area they were working in and start building it. Or they open follow-on PRs for edge cases they discovered. Or they refactor adjacent code that looked wrong to them. By sprint close, a team with agents has often shipped things that were never in the sprint plan — not because anyone made a deliberate scope decision, but because the agents had capacity and used it.

Scrum.org published a guide on "How to Run the Sprint Retrospective When Half of Your Team is AI Agents" — which is a real artifact that exists because this is a real, current problem in teams running AI-augmented sprints. The agents generate output that the retrospective has to account for. Some of that output is valuable work the team would have planned if they'd thought of it. Some of it is scope drift that creates downstream problems. The retrospective cannot meaningfully process what it doesn't know happened. Stale or incomplete sprint data is already costing teams full sprints in rework.

How Kognita makes the split workable

The information needed to split a backlog is not mysterious — it is structural. Which services does each ticket touch? Are there open PRs in those services? Is the service being modified by another team this sprint? Does this ticket depend on work that hasn't merged yet? Is the acceptance criteria specific enough that an agent can verify its own output? These questions have deterministic answers in the codebase and in Jira. What's missing is the ability to surface them during planning, in the room, without a twenty-minute engineering deep-dive per ticket.

Kognita connects the sprint backlog to actual codebase state so scrum masters and product owners can get those answers before the commitment is made. "Which of these fourteen tickets are safe to assign to agents?" becomes a question with a grounded answer — not a guess based on ticket title and the engineering lead's intuition.

What Kognita surfaces before the sprint planning session
What Kognita surfaces before the sprint planning session:

  Scrum Master asks: "Which of these 14 backlog tickets are
  safe to assign to agents this sprint?"

  Kognita returns:

  Agent-safe (6 tickets):
  -> FEAT-441: single-service, no open PRs in target module,
     clear acceptance criteria, test coverage exists
  -> FEAT-447: pure test addition, no production code changes
  -> FEAT-452: spec-only change, auto-verifiable against live API
  -> FEAT-459: mechanical refactor, deterministic transformation
  -> FEAT-463: add field to internal admin UI, no API surface change
  -> FEAT-466: update error messages in AuthService (no logic change)

  Requires human judgment (8 tickets):
  -> FEAT-438: AuthMiddleware touched by 4 services — collision risk
  -> FEAT-443: production failure mode knowledge required
  -> FEAT-445: depends on PLAT-339 (external team, not yet merged)
  -> FEAT-449: schema migration — rollback complexity is high
  -> FEAT-451: acceptance criteria ambiguous — needs PO clarification
  -> FEAT-453: cross-team API contract — requires coordination
  -> FEAT-460: touches NotificationService currently in flux (Team B)
  -> FEAT-464: agent ran this last sprint and introduced a regression

  Review bandwidth check:
  6 agent tasks × avg 3 PRs each = ~18 PRs to review
  Team review capacity this sprint: ~20 PRs
  Recommendation: agent allocation is within safe review bandwidth

The review bandwidth check matters as much as the dependency analysis. Knowing that six agent tasks will produce approximately eighteen PRs, and that the team has capacity to review twenty, is the kind of calibration that prevents the sprint from collapsing into review backlog by day eight. That is the token budget planning Scrum.org is describing — and it requires knowing what the agent work will actually produce, not what six story points implies about human effort.

What product owners need to change about ticket writing

The human/agent split also creates a new requirement on the product owner side: agent-assigned tickets need more explicit acceptance criteria than human-assigned tickets. A human engineer can read an ambiguous requirement, make a judgment call, and verify their own interpretation with the product owner before shipping. An agent will interpret ambiguity in whatever direction the surrounding context suggests and generate a complete implementation before anyone reviews it. If the criteria said "improve checkout performance" without specifying what to measure or what "improved" means, the agent will pick metrics and targets that may have nothing to do with what the product owner had in mind.

This is a change in ticket-writing practice, not a criticism of how product owners currently work. The practice was calibrated for human engineers who ask clarifying questions. Agents don't ask — they infer. The product owner who wants to use agent capacity effectively needs to write acceptance criteria that are specific enough to be machine-verifiable. That is a skill that product teams are learning now, often by reading agent-generated PRs that passed all the tests and shipped the wrong thing.

Final take

Sprint planning was built for a homogeneous team. It has never been updated to handle a team where half the workers have unlimited parallel capacity, no concept of when to stop and ask, and no intrinsic knowledge of system state. Scrum.org is saying this out loud now. The framework is being updated in real time. Most teams are still running planning in 2022 mode while their agents are running at 2026 speed.

The human/agent split is the right framing. Making it workable in a planning session requires system-grounded answers to questions that don't live in story points: which tickets are truly independent, which services each touches, and where the hidden dependencies are. That is the information gap between Scrum.org's guidance and what most teams can actually execute today.