Blog
Product Specs Written With ChatGPT Are Still Missing the One Input That Matters: What Your System Can Do
10 min read
Product managers who write specs with ChatGPT get better specs. That is not in dispute. The acceptance criteria are clearer. The user stories are better structured. The edge cases are more thoroughly considered. The output of AI-assisted spec writing is measurably higher quality than spec writing without it — when you measure quality by structure and completeness.
What does not improve is alignment with the system the spec describes. ChatGPT does not know your codebase. It does not know that your fulfillment model uses different status names than the ones it suggested. It does not know that your customer portal is stateless by design, or that the "real-time filtering" it recommended requires a WebSocket your platform does not have. The spec is well-written and technically wrong in ways that only become visible when engineering reads it.
This is a new category of specification problem. Before AI-assisted spec writing, the gaps in a spec reflected what the product manager did not think to address. Now the gaps reflect what ChatGPT did not know about the system — and ChatGPT writes with the same confidence whether it knows the system or not. The spec looks authoritative. The alignment issues are invisible until sprint planning.
What a well-written but system-blind spec looks like
The failure mode is specific. ChatGPT produces specs that describe how the feature should work in a generic, well-designed system. It draws from patterns common to SaaS products, e-commerce platforms, or enterprise software depending on the domain. Those patterns are often correct for general software. They frequently conflict with how the specific system under discussion actually works.
Product spec written with ChatGPT — without codebase context:
Feature: "Allow customers to filter orders by fulfillment status"
Acceptance criteria (AI-generated):
-> Customer can filter by: Pending, Processing, Shipped, Delivered, Cancelled
-> Filters apply in real time as the customer selects them
-> Filter state persists across page refreshes
-> Works on mobile and desktop
What engineering discovers:
-> The fulfillment model uses different status names: QUEUED, FULFILLING,
DISPATCHED, COMPLETE, VOIDED — the spec names don't match the schema
-> "Real time" filtering requires a WebSocket the platform doesn't have —
the current approach is full-page refresh on filter selection
-> "Persist across refreshes" conflicts with session handling that was
intentionally stateless for compliance reasonsThe acceptance criteria are reasonable and well-written. Every one of them has a hidden conflict with the actual system. The status names do not match the data model. The real-time requirement assumes infrastructure that does not exist. The persistence requirement conflicts with an intentional architectural decision made for compliance. None of these conflicts are obvious from the spec. All of them are obvious to anyone who has read the codebase.
How misalignment compounds into sprint failures
The story from this point is familiar. Engineering receives the spec and begins estimation. Someone notes that the status names do not match. A meeting is scheduled to align on terminology. During the meeting, the real-time requirement surfaces as a problem. A second meeting is scheduled to discuss whether to add WebSocket infrastructure or revise the spec. By the time the spec is revised to match the system, two weeks of a four-week sprint have been spent on alignment that could have been done before the spec was written.
This is not a communication failure. Both the product manager and the engineers were trying to do good work. The product manager wrote a thorough, well-structured spec. The engineers identified the conflicts correctly. The failure is that the tool used to write the spec had no access to the information needed to catch the conflicts before they became meetings.
How misaligned specs affect engineering estimates:
Spec that matches system reality:
Engineering reads it → implementation path is clear
Estimate: 3 days, 1 point of product review
Risk: low
Spec that doesn't match system reality:
Engineering reads it → three hidden assumption conflicts
Estimate: "3 days if X works the way we think, but we need to check Y and Z first"
Actual: 2 days of alignment meetings, revised spec, 1 week of implementation
Risk: scope creep, sprint blowout, relationship frictionEngineering taking longer than expected is often attributed to estimation error or scope creep. Frequently it is alignment work that was never scoped because it was not visible at estimation time. Specs that look complete and reasonable produce estimates that assume they are complete and reasonable. When they are not, the gap lands in the sprint as unplanned work.
What the spec looks like when written with codebase context
The difference when a product manager has access to the codebase before writing a spec is not in the quality of the writing. It is in the assumptions embedded in the spec. When the PM can ask the system "what fulfillment statuses do we use?" before writing the acceptance criteria, the spec uses the right names. When the PM can ask "how do we handle filter state today?", the spec describes a behavior consistent with how the system actually works.
The same spec, written with codebase context:
Before writing the spec, the PM asks the codebase:
"What fulfillment statuses does our order model use?"
→ QUEUED, FULFILLING, DISPATCHED, COMPLETE, VOIDED
"How do we currently handle filter state in the orders list?"
→ Full-page refresh, no persistent state, stateless session by design
"Do we have real-time capabilities in the customer portal?"
→ No WebSocket infrastructure; real-time would require a new service
Spec written after codebase context:
-> Filter options: Queued, Fulfilling, Dispatched, Complete, Voided
-> Filters apply via page refresh (consistent with current behavior)
-> Filter state is not persisted (stateless session, compliance requirement)
→ No surprises. Engineering estimates immediately. Sprint holds.The spec still uses ChatGPT or Claude to structure, write, and refine. That value is preserved. What changes is the input: instead of drawing on generic software patterns, the AI writes from specific knowledge of this system. The acceptance criteria match the data model. The behavior described matches the current architecture. Engineering reads it and can estimate without a preliminary alignment meeting.
The irony of AI-assisted spec writing without system context
There is a specific irony in the current state of AI-assisted product management. Product managers are using AI to write better specs. Engineers are using AI to implement those specs faster. But the AI available to the product manager knows only generic software patterns, while the AI available to the engineer knows the specific codebase. The faster both move, the faster they move in directions that diverge.
The product manager produces a polished, AI-assisted spec in two hours instead of four. The engineer begins implementation immediately, AI-assisted, and makes fast progress on a feature the spec describes incorrectly. The gap between what was specified and what the system can do is discovered later than it would have been if both sides had been moving more slowly.
Product and engineering scope misalignment has always been a problem. AI tools on both sides of the divide can make it worse before they make it better, if the AI available to product does not have access to the same system knowledge the AI available to engineering has.
What system-aware spec writing requires
The fix is not for product managers to learn to read code. It is for the spec writing workflow to include a step where the product manager queries the system before drafting. "What data model do we use for this entity?" "What are the current status values?" "What infrastructure assumptions does the current implementation make?" These questions have specific answers in the codebase — answers that should inform the spec, not be discovered by engineering after the spec is written.
When product managers have access to a plain-language codebase query tool — one that does not require reading code or operating a CLI — these questions become part of the normal spec workflow. The PM asks, gets an answer in plain language, and uses that answer to write the spec correctly the first time. ChatGPT or Claude still writes the spec. The inputs it draws from include the actual system, not just general software knowledge.
Final take
AI-assisted spec writing is a real improvement. The specs that come out of it are clearer, better structured, and more complete. The missing input is system context: what the codebase actually does, what names it uses, what architectural assumptions are already in place. Without that input, the AI writes from generic patterns — and generic patterns frequently conflict with specific systems.
The product manager using ChatGPT to write specs is not making a mistake. They are using the best tool they have access to, with the information they have access to. The gap is that they do not have access to the codebase before the spec goes to engineering. Closing that gap does not require product managers to become technical — it requires giving them a plain-language interface to the system truth before the spec is written.