KognitaKognita.

Blog

The Rollback No One Told the Stakeholder About

9 min read

The Head of Operations announced the payment retry feature to three enterprise customers two days before engineering rolled it back. The rollback happened at 11 PM on a Wednesday. The on-call engineer made the right call — there was a race condition hitting checkout, the revert restored service within twenty minutes, and by midnight everything was stable. Nobody in engineering did anything wrong. They posted a note in the incidents channel, resolved the page, and went to sleep.

Friday morning, one of those enterprise customers emailed asking why the feature had disappeared. They had built their month-end billing run around it. The Head of Operations opened Slack to find the two-line incident note from Wednesday night. She had no idea a rollback had happened. She had no idea what a rollback meant in terms of customer impact. She had to respond to a customer she had personally promised a feature to, explain something she did not understand herself, and do it before she had spoken to anyone from engineering or product.

The rollback was technically correct. The communication was a complete failure — not because anyone was careless, but because there is no process in most organizations for "tell the business stakeholder when this feature gets reverted."

How rollbacks happen and why stakeholders find out last

Rollbacks are not planned events. They happen under pressure, at off-hours, when an on-call engineer is triaging a production incident and needs to restore service as quickly as possible. The decision is usually correct and usually fast. The entire workflow — detect, diagnose, decide, execute, verify — is optimized for speed. Stakeholder notification is not part of that workflow because it cannot be. Nobody has a list of "who is the business owner of FEAT-304?" pinned to the incident runbook.

The aftermath is worse than the event itself. The engineer who executed the rollback is not the one who knows what customer commitments were made against the feature. The product manager who owns the feature may not see the incident note until the next morning. Engineering considers the situation resolved — the service is stable, the incident is closed. From the business side, the situation is not resolved at all. A feature that was publicly announced to customers has silently disappeared, and the people responsible for those customers have no information about it.

What happens from Wednesday 11 PM to Friday 9 AM when a feature gets rolled back
Wednesday 11:04 PM
  On-call engineer: pages fire on checkout-service
  Root cause: new payment retry logic introduced a race condition
  Decision: roll back FEAT-304 (payment retry v2)
  Action taken: revert merged, deployment complete by 11:31 PM
  Engineering Slack: brief note in #incidents, thread archived

Thursday (no business hours activity)
  Jira ticket FEAT-304: still marked Done
  No stakeholder notification sent
  No customer communication triggered

Friday 9:17 AM
  Email from enterprise customer:
    "Hi — the new retry feature seems to have disappeared.
     We were relying on it for our month-end billing run.
     Can you let us know what happened?"

Friday 9:22 AM
  Head of Operations opens Slack
  Searches for any mention of a rollback
  Finds a two-line message in #incidents from 11 PM Wednesday
  Has no context for what was reverted or why
  Must now respond to the customer before understanding what happened herself

Why this is a structural problem, not a communication failure

The instinct is to call this a communication problem and propose a solution: the on-call engineer should send a notification, the incident management tool should have a stakeholder field, the post-mortem template should include a business impact section. These solutions are not wrong. They are also not reliable. They depend on individuals remembering to do things while they are in the middle of an incident — which is exactly when process adherence is lowest.

Incident response context lives nowhere that business stakeholders can access it, and adding one more manual step to the on-call workflow does not change the structural reality: the on-call engineer does not know who the business owner of each feature is, does not know what customer commitments have been made, and is not in a position to draft a stakeholder notification at midnight during a production incident. These are not failures of character. They are failures of infrastructure.

The same structural gap shows up in Jira. When a feature gets rolled back, the Jira ticket that marked the feature as Done does not automatically revert to In Progress. It stays Done. The stakeholder who checks Jira on Thursday morning sees no signal that anything has changed. The feature is marked complete. The system has already reverted it. Jira and reality have diverged, and nothing triggers an alert about the divergence.

Why rollback notifications do not reach business stakeholders — the structural reasons
Why rollback notifications do not reach business stakeholders:

  The on-call workflow is optimized for speed, not communication
    -> Engineer's job at 11 PM: restore service, not send status emails
    -> Incident channels exist for engineering; no equivalent for business owners
    -> No documented process for "notify the feature stakeholder" during a revert

  The Jira ticket does not update automatically
    -> Rollback is a code-level action; Jira ticket stays Done
    -> Engineer may add a comment days later, or never
    -> Stakeholder checking Jira sees no signal that anything changed

  The incident debrief happens after the fact — if at all
    -> Post-mortems are scheduled when incidents are severe enough
    -> A clean rollback that restores service is often not treated as incident-worthy
    -> The business impact (customer commitment made on now-reverted feature) is not
       visible to the engineering team at 11 PM

  No one knows who the feature's business owner is
    -> Engineering knows who wrote the code
    -> Nobody has a mapping from "FEAT-304" to "the Head of Operations
       who personally announced this to three enterprise accounts"

The customer-commitment problem is the real cost

Engineering rolls back features to protect system stability. That is the right call. The problem is that by the time the rollback happens, the feature has usually been in production long enough for business stakeholders to have made commitments around it. A sales rep has told a customer the feature is live. A Head of Operations has built a workflow assumption around it. A VP has referenced it in a client QBR. These commitments are made in good faith against a feature that is currently in production. They become liabilities the moment the rollback executes.

The lag between the rollback and the stakeholder finding out is where the damage accumulates. If the Head of Operations knew about the rollback Thursday morning, she could have reached the three enterprise customers before they noticed, explained the situation proactively, and set expectations for when the feature would return. Instead, she found out Friday morning from one of those customers — and spent the next hour learning what happened before she could respond. The rollback cost engineering twenty minutes to execute. The communication gap cost the business an indeterminate amount of customer trust.

Status updates assembled from memory and Slack scans do not catch reversions that happen at 11 PM and do not generate communication artifacts. The gap is structural: there is no automated channel from "system state changed" to "the business person responsible for this feature's customer commitments." Engineering has monitoring and alerting. Business stakeholders have email and Jira. Those two channels do not touch.

What a self-serve visibility channel looks like for stakeholders

The fix is not adding a mandatory notification field to the incident runbook. The fix is giving business stakeholders a way to check for reversions and system changes themselves — before a customer emails them about it.

A product manager or operations lead who can ask "what deployments or reversions happened in the last 48 hours that touch features with active Jira epics?" is in a fundamentally different position than one who is waiting for engineering to notify them. The question is self-serve. It does not require a technical background to ask or to understand the answer. And it can be asked proactively — on Monday morning before the week starts, before a customer call, after seeing an incident note that did not include enough business context to act on.

What Kognita connected to Jira surfaces about a reversion event and its business impact
Stakeholder asks Kognita (connected to Jira + codebase):
"What deployments or reversions happened in the last 48 hours that
touch features with active Jira epics?"

Kognita returns:

  Reversion detected: Wednesday 11:31 PM
    Feature: Payment retry logic v2 (FEAT-304)
    Action: Reverted (revert commit abc4f91)
    Jira epic: PAYMENTS-Q3 — Payment reliability improvements
    Jira ticket status: Still marked Done (not updated post-revert)
    Original ship date: Monday this week

  Context from Jira:
    -> PAYMENTS-Q3 epic has 3 active sub-tickets still open
    -> BUG-901 opened Thursday AM: related to retry behavior post-revert
    -> No incident ticket created for the reversion event

  What this means:
    -> FEAT-304 is no longer active in production as of Wednesday night
    -> Any customer commitments made against this feature are now undeliverable
    -> Engineering has an open bug (BUG-901) that may indicate ongoing work
    -> Jira does not reflect the current system state for this feature

The answer in that example changes everything about the Friday morning situation. The Head of Operations who checks Kognita on Thursday morning — or even Wednesday night after seeing the incident note — knows immediately that FEAT-304 was reverted, that the Jira ticket has not been updated, that there is an open bug related to the reversion, and that she needs to contact her three enterprise accounts before they notice. She is not reacting to a customer email. She is getting ahead of it.

The scope of invisible system changes is wider than rollbacks

Rollbacks are the most dramatic example of a feature disappearing without stakeholder notification, but they are not the only one. Feature flags get turned off due to performance issues. Emergency patches modify feature behavior without generating corresponding Jira tickets. Edge cases get silently disabled to unblock a release. Configuration changes roll back without anyone filing an incident. In each case, the system state changes, and the business stakeholder who owns the customer relationship around that feature has no automatic channel for knowing about it.

The full scope of system changes that routinely happen without stakeholder notification
Changes that routinely happen without stakeholder notification:

  Rollbacks
    -> Feature reverted due to production bug
    -> Jira ticket remains Done

  Emergency patches
    -> Hotfix deployed to address critical behavior
    -> Modifies feature without corresponding ticket update

  Feature flag state changes
    -> Flag turned off due to performance issue
    -> Feature invisible to customers, no announcement

  Silent scope reduction
    -> Edge case disabled to unblock release
    -> Behavior differs from what was announced to customers

  Configuration rollbacks
    -> Environment config reverted; feature degrades silently
    -> No code change, no deploy event in standard tooling

What these events have in common is that they all change system reality without changing Jira. The gap between Jira ticket state and actual system state is the same gap that makes "engineering says it shipped" conversations so frustrating — and it runs in both directions. A feature can be in Jira as Done when it has been reverted. It can be marked Done when the flag is off. It can be Done when a critical edge case has been silently disabled. Jira does not know. The only source that knows is the system itself.

How Kognita surfaces reversion and deployment events connected to Jira epics

Kognita connects the codebase and Jira through a managed MCP integration. It indexes your repositories continuously, so it knows what the system state actually is — what is deployed, what has been reverted, what has changed in the last 24, 48, or 72 hours. When connected to Jira, it can answer questions that bridge both: which recent deployments or reversions touch Jira epics that are marked as active or Done? What changed in the last week that might affect features you are currently running customer conversations around?

This does not require any engineering involvement once the connection is set up. The product manager checks in the morning. The operations lead queries before a customer call. The query runs against actual system state, not against what someone remembered to post in an incident channel. Nothing is assembled from memory. Nothing is filtered through someone else's summary. The system knows what happened, and the stakeholder can ask the system directly.

The angle is visibility, not blame. The on-call engineer who executed the rollback made the right call. The goal is not to add accountability overhead to a stressful incident workflow. The goal is to give the people who own customer relationships a self-serve channel to know when the system changes under features they are actively managing — so they can respond proactively instead of reactively.

Final take

Finding out from a customer that a feature was rolled back is not a rare edge case. It is the predictable result of a structural gap: engineering has the incident workflow, business stakeholders have Jira and Slack, and there is no automated channel connecting system state changes to the people who have made commitments around those features. The on-call engineer cannot fix this. Adding a field to the incident template cannot fix this. The process fails at the point of contact between a midnight rollback decision and a customer relationship that was established two days earlier.

The right fix is not a better notification process. It is self-serve visibility into system state for the business stakeholders who need it — so that checking for reversions and deployment changes that affect features they own is something they can do proactively, without waiting for engineering to tell them, without being dependent on incident notes that were written at midnight for a technical audience.

A rollback at 11 PM is an engineering event. A customer finding out through their own observation that a feature disappeared is a business failure. The gap between those two things is exactly what a self-serve visibility channel closes.