Blog

Scrum Master: Why Velocity Means Less When AI Is Writing Half the Code

9 min read

The sprint board shows 42 points completed. Last quarter the team averaged 28. Leadership is pleased. The scrum master is trying to figure out what to commit to next sprint, and the only number they have to work with — velocity — no longer tells them anything useful. Forty-two points of AI-assisted delivery does not predict forty-two points next sprint, because AI does not accelerate every ticket uniformly, and because velocity was never measuring what people thought it was measuring.

Story points were calibrated for human effort. A 3-point ticket meant approximately a day and a half of focused engineering work. That translation — points to time — was the whole reason velocity was useful for capacity planning. When AI tools started generating implementations in twenty-five minutes that previously took two days, the translation broke. The point count stayed the same. The time evaporated. And the scrum master is now trying to forecast sprint capacity with a metric that no longer maps to anything real.

What a 3-point ticket actually costs now

The naive reading of AI-assisted development is that a 3-point ticket takes less time, so the team can fit more 3-point tickets into a sprint, and velocity goes up, and everyone wins. The reality is more complicated. AI compresses the obvious implementation work dramatically, but it introduces different work that doesn't show up in story points: reviewing AI output, fixing incorrect assumptions the model made about the codebase, handling the edge cases the AI surfaced that weren't in the original scope, and debugging test failures from AI-generated code that imported a library not in the stack.

A 3-point ticket completed with AI assistance often results in four GitHub PRs: the original feature, a follow-on for edge cases, a test fix, and a correction to an API call the model got wrong. Three of those four PRs have no associated Jira ticket. They do not appear in velocity counting. But they consumed real engineering hours. The sprint board shows one 3-point story completed. The actual effort was closer to six or seven hours spread across four separate pieces of work — which is not wildly different from pre-AI, except now the scrum master has no visibility into any of it except the single closed ticket.

What a 3-point ticket used to predict versus what it predicts now

What a 3-point ticket used to predict — and what it predicts now:

  BEFORE AI-ASSISTED DEVELOPMENT
  Story: "Add address validation to checkout form"
  Points: 3
  Historical meaning: ~1.5 engineer-days of focused work
  What it included: reading relevant code, writing the feature,
    writing tests, code review, fixing review comments
  What sprint planning used it for: capacity forecasting
    → 40 points per sprint = ~20 engineer-days of delivery
  Accuracy: reasonably predictive over time

  AFTER AI-ASSISTED DEVELOPMENT (same team, same codebase)
  Story: "Add address validation to checkout form"
  Points: 3 (unchanged — same ticket, same rubric)
  Actual time: 25 minutes of AI-generated implementation
    + 40 minutes reviewing and adjusting output
    + 2 hours on edge case PRs the AI opened for postal
      format variations across 12 country codes
    + 90 minutes on a failing test the AI introduced
      because it assumed a library that isn't in the stack
  Total elapsed: ~4.5 hours for a "3-point" story
  What sprint planning uses it for: still capacity forecasting
  Accuracy: broken — the point count predicts nothing reliably

  THE MATH PROBLEM
  Old sprint: 5 engineers × 8 points avg = 40 points
  New sprint: 5 engineers × 40 points completed
  Leadership reads: "productivity quintupled"
  Reality: throughput increased, but so did the variance,
    the rework rate, and the number of follow-on PRs.
    The number tells you nothing about what next sprint holds.

The sprint math becomes meaningless quickly. A team that was averaging 40 points posts 100. Leadership recalibrates expectations upward. The next sprint, the tickets are harder — more cross-service coordination, more architectural work that AI doesn't help with — and the team posts 38. Nobody can explain the variance because the underlying signal is gone. Velocity stopped being informative the moment AI adoption became significant, and most teams haven't found a replacement.

Why velocity was always a proxy for something else

Velocity was never the real metric. It was a proxy for team capacity — specifically, for how much work a team could complete in a fixed time window. The proxy worked because human effort-to-time ratios were relatively stable across sprints. If the team averaged 40 points over the last six sprints, you could reasonably plan 40 points for the next one. The consistency was the point.

AI broke that consistency by making it wildly asymmetric across work types. Greenfield implementation, boilerplate, scaffolding, CRUD operations, and test generation are heavily AI-amenable — AI cuts the time by 60-80%. Cross-service coordination, database migrations, debugging async workflows, architectural decisions, and anything that requires understanding the existing system's behavior are much less AI-amenable. The coordination overhead is fundamentally human, regardless of what AI tool the engineer is using.

A sprint that mixes both types sees enormous variance within the same point values. Two 5-point stories can take forty minutes and two days respectively depending on how AI-amenable they are. That variance destroys the consistency that made velocity predictive. Story point estimation broke down before it reached the planning session — the calibration never adapted to AI-assisted effort, and now the numbers are noise.

The information velocity actually hid

Even when velocity was a reliable capacity signal, it was always hiding information that mattered for sprint planning. Velocity told you how many points got done. It told you nothing about what those points touched, how the work was distributed across the codebase, whether tickets were independent or blocking each other, or whether the sprint was setting up a rework crisis for the next one.

AI-accelerated development makes this hidden layer more consequential, not less. When engineers move faster, more work lands in the same services in the same sprint. Six tickets all modifying the payment service, written quickly with AI assistance and merged independently, create a conflict cluster that surfaces in QA a week later. The velocity number showed 42 points. It showed nothing about the six PRs that modified shared infrastructure without coordination.

What a high-velocity sprint hides from the scrum master

What a 42-point sprint can hide from the scrum master:

  Hidden: scope of shared infrastructure touched
  → 6 tickets all modified the payment service this sprint
  → none were coordinated; three introduce conflicting assumptions
    about the payment gateway retry logic
  → the conflict will surface in QA, not in the sprint board

  Hidden: independence vs. blocking relationships
  → 9 tickets marked "in progress" simultaneously
  → 4 of the 9 cannot merge until PLAT-441 merges
  → PLAT-441 has been in review for 5 days with no activity
  → the other 8 engineers are effectively blocked
    and no standup raised it

  Hidden: AI-introduced rework volume
  → 14 PRs merged this sprint
  → 6 additional PRs were opened to fix AI-generated output:
    3 test failures, 2 incorrect API call signatures,
    1 wrong data model assumption
  → the rework PRs don't have Jira tickets; they're invisible
    to velocity counting but consumed real engineering time

  Hidden: carry-over risk
  → 3 tickets sitting at "in review" since day 4
  → all 3 touch the notification service
  → a team B refactor of the notification service
    starts next sprint; these tickets will need rework
    if they don't merge before sprint close

The hidden dependencies are the expensive problem. Sprint planning built on stale data or absent data creates exactly this kind of invisible risk: tickets committed as independent that are actually blocking each other, work scheduled in parallel that touches the same shared service, follow-on rework that eats the next sprint before it starts. Velocity doesn't surface any of it. The sprint board looks clean until it isn't.

What scrum masters actually need to plan a sprint

The question a scrum master is actually trying to answer in planning is not "how many points did we do last sprint?" It is: what can this team safely commit to delivering in the next two weeks, given what we know about the work and the system? Those are different questions. The second requires knowing things that velocity doesn't capture.

Which tickets are genuinely independent versus which ones block each other? Which tickets touch shared infrastructure — services, database schemas, APIs — that multiple engineers will be modifying simultaneously? Are there any dependencies on work in another team's sprint that would stall delivery if that work slips? What parts of the codebase are currently in active flux from previous sprints, and does any of this sprint's work land in those same areas?

These questions have answers. The answers exist in the codebase and in Jira. What's missing is the ability to access them during planning without asking an engineer to spend thirty minutes doing manual cross-referencing. The scrum master cannot read the codebase. The product owner cannot run a dependency query. The engineering lead's mental model of what is currently in flux is accurate for the areas they touched recently and unreliable for everything else.

Replacing the velocity question with system reality

Kognita connects sprint planning to actual codebase state. Instead of "we have 42 points of capacity," the question becomes: which of these tickets touch shared infrastructure, which are blocking each other, and what's currently in active flux in the areas they cover? Those questions get answered from the real system, not from historical point averages that predate AI-assisted development.

This changes what a scrum master can do in planning. Before a sprint is committed, they can ask which of the proposed tickets actually modify the same services. They can surface dependency chains — ticket B cannot start until ticket A merges, which isn't on anyone's radar. They can identify tickets that land in areas currently being modified by another team, creating the kind of coordination risk that kills sprint close. None of this required a point estimate. All of it requires system context.

What a pre-sprint complexity query surfaces from Kognita + Jira

Scrum master asks Kognita before sprint planning:
"Which of these 12 tickets touch shared infrastructure,
and how many are actually independent?"

Kognita returns:

  Tickets touching shared infrastructure:
  → FEAT-881, FEAT-894, FEAT-901 all modify PaymentService
    → PaymentService last modified in Sprint 14 (3 weeks ago)
    → 2 open PRs currently modifying PaymentService
      not yet merged — potential merge conflicts
  → FEAT-877, FEAT-903 both modify the notification schema
    → Schema currently being refactored by Team B (ongoing)
    → coordination recommended before committing either ticket

  Dependency chain:
  → FEAT-891 cannot start until FEAT-884 merges
    (FEAT-884 creates the API endpoint FEAT-891 calls)
  → FEAT-897 depends on database migration in PLAT-441
    (PLAT-441 is in a separate team's sprint — confirm timing)
  → FEAT-878 is fully independent

  Truly independent tickets: 4 of 12
  Tickets with cross-service coupling: 7 of 12
  Tickets with external dependencies: 3 of 12

  Recommendation: sequence PaymentService tickets to avoid
  conflict; confirm PLAT-441 timeline before committing FEAT-897

The output of that query is more useful for capacity planning than velocity. Knowing that only four of twelve proposed tickets are truly independent tells you something concrete about what the sprint will actually look like — coordination overhead, sequencing constraints, risk of collisions. A velocity number tells you how many points the team completed last sprint, which is now a function of how AI-amenable last sprint's work happened to be.

For engineering managers, this shift in framing also helps with the stakeholder conversation. Instead of defending a velocity number that swings wildly depending on work type, the answer to "why did velocity drop?" becomes grounded in system reality: this sprint's work was coordination-heavy, touched shared infrastructure, and required sequencing that AI tools don't help with. That is an accurate and defensible answer. "Velocity went down" with no supporting context is just a number that went in the wrong direction.

What to actually track now

Abandoning velocity entirely creates its own problem — leadership needs a capacity signal, and flow metrics like cycle time and throughput don't communicate well in a planning context with stakeholders who are used to sprint commitments. The answer is not to throw out measurement, but to add context to what gets measured.

Tracking velocity separately for AI-amenable work versus coordination-heavy work gives a more honest capacity picture. A sprint that is mostly scaffolding and implementation will produce higher throughput than a sprint heavy in cross-service coordination and architectural work. Treating them as the same capacity pool — as velocity does — is what creates the unpredictable swings.

Tracking mid-sprint scope changes per sprint gives a direct signal of how well planning-time system knowledge matched implementation-time reality. If tickets regularly gain scope after sprint start, the problem is not estimation skill — it is that planning is running on insufficient information about what the work actually involves. That is an information problem, and it closes when planning has access to live codebase context, not when the team gets better at guessing.

Final take

Velocity was calibrated for a world where implementation effort was the primary variable in sprint capacity. AI changed the primary variable to system complexity and coordination overhead. Those things were always present — they just weren't the limiting factor when implementation was slow. Now that implementation is fast, coordination and scope visibility are what determine whether a sprint succeeds. Velocity doesn't measure either of them.

Scrum masters trying to run AI-era sprints with velocity-based planning are measuring the wrong thing. The metric they need isn't a better version of points — it's visibility into what the sprint actually contains: which services it touches, which tickets are independent, and what the codebase currently looks like in the areas where the work lands. That information exists. It is just not in the velocity chart.