Blog

Same CLAUDE.md, Different Results: Why Rule-Following Is Non-Deterministic

9 min read

One of the most disorienting things about CLAUDE.md is that the same file produces different behavior on different days. Monday it follows your rules to the letter. Tuesday, same file, same prompt shape, it ignores half of them. You change nothing, rerun, and it works again. Developers report it constantly: "the same instruction has worked correctly in other sessions." It is not your file degrading. It is that rule-following from an advisory file was never deterministic to begin with.

It is not the file — it is everything around it

When the output changes and the file did not, the variable is the rest of the context:

The file is constant; the context around it isn't

Same file, different runs

  Monday   "add an endpoint" -> follows every CLAUDE.md rule
  Tuesday  same file, same rule -> ignores half of them
  Wednesday works again -> you change nothing, you just rerun

  You didn't edit the file. The variable that changed was
  everything ELSE: the task, the open files, the thread length.

Each session has a different task, different files open, a different conversation length by the time the rule matters. The rule is competing against a different field every time, and as we cover in why Claude ignores your CLAUDE.md, a general advisory line loses to specific, recent context. Change the field and you change the odds.

Probabilistic by nature

The deeper reason is that a language model is not a rules engine. It does not execute your CLAUDE.md; it samples a response in which your rules are one influence among many:

Following a rule is an outcome with a probability

Why rule-following is probabilistic

  CLAUDE.md is advisory text weighed against the whole window.
  Its odds of "winning" on any given turn depend on:
    -> how much other context is competing
    -> how recent/specific the conflicting signal is
    -> how long the session has run (rules fade with length)
    -> sampling: the model is probabilistic, not a rules engine

  "Followed" is an outcome with a probability, not a setting.

Because the file is advisory, not enforced, "did it follow the rule" has a probability attached, and that probability shifts with everything else in the window. The same input distribution can yield compliance one run and a miss the next. That is not malfunction; it is what sampling from a probabilistic model means.

Why inconsistency is worse than consistent failure

A rule that never works, you route around. A rule that works most of the time is more dangerous, because it earns trust it cannot keep — you stop checking, and the one run in five where it lapses ships. Intermittent compliance is the hardest kind to defend against, and it is exactly what an advisory file delivers. It also poisons debugging: you cannot reproduce the failure on demand, so you cannot tell whether your fix worked or you just got a lucky sample.

Buy back determinism where it counts

You cannot make a probabilistic model deterministic, but you can move the things that must be reliable off the probabilistic path:

Move hard requirements off the advisory path

Where you can buy back determinism

  Advisory rule (CLAUDE.md):
    -> probabilistic; varies run to run; no guarantee

  Hard guardrail (hook):
    -> deterministic on ACTIONS: "never write here", every time

  Current fact (index):
    -> the same true answer about the code on every query
    -> variance removed from the GROUNDING, if not the prose

Hard constraints belong in hooks, which run every time regardless of sampling. And the facts about your code belong in an index, so that at least the grounding the model reasons from is the same correct answer on every query, even if the prose it generates varies. You will not get deterministic phrasing — but you can stop the model from reasoning off a different, possibly-stale picture of the codebase each run, the variance behind why AI gives different answers about your codebase.

Where Kognita fits

Kognita removes one whole source of run-to-run variance: the grounding. Every session, every agent, queries the same continuously-updated semantic index and gets the same current answer about how the code actually works. The model is still probabilistic, but it is no longer also working from a different, drifting snapshot each time. Pair that with hooks for the hard rules, and you have pushed the unreliable, advisory part of the stack down to where it can only affect phrasing — not whether the agent knows the truth.

Final take

The same CLAUDE.md gives different results because following it was always a probability, not a guarantee — and the probability moves with every change in the surrounding context. Editing the file harder will not fix a property the mechanism never had.

Advisory rules are non-deterministic by nature. Put hard requirements in hooks and codebase truth in an index, so the parts that must be reliable no longer depend on a lucky sample.