Blog
Same CLAUDE.md, Different Results: Why Rule-Following Is Non-Deterministic
9 min read
One of the most disorienting things about CLAUDE.md is that the same file produces different behavior on different days. Monday it follows your rules to the letter. Tuesday, same file, same prompt shape, it ignores half of them. You change nothing, rerun, and it works again. Developers report it constantly: "the same instruction has worked correctly in other sessions." It is not your file degrading. It is that rule-following from an advisory file was never deterministic to begin with.
It is not the file — it is everything around it
When the output changes and the file did not, the variable is the rest of the context:
Same file, different runs
Monday "add an endpoint" -> follows every CLAUDE.md rule
Tuesday same file, same rule -> ignores half of them
Wednesday works again -> you change nothing, you just rerun
You didn't edit the file. The variable that changed was
everything ELSE: the task, the open files, the thread length.Each session has a different task, different files open, a different conversation length by the time the rule matters. The rule is competing against a different field every time, and as we cover in why Claude ignores your CLAUDE.md, a general advisory line loses to specific, recent context. Change the field and you change the odds.
Probabilistic by nature
The deeper reason is that a language model is not a rules engine. It does not execute your CLAUDE.md; it samples a response in which your rules are one influence among many:
Why rule-following is probabilistic
CLAUDE.md is advisory text weighed against the whole window.
Its odds of "winning" on any given turn depend on:
-> how much other context is competing
-> how recent/specific the conflicting signal is
-> how long the session has run (rules fade with length)
-> sampling: the model is probabilistic, not a rules engine
"Followed" is an outcome with a probability, not a setting.Because the file is advisory, not enforced, "did it follow the rule" has a probability attached, and that probability shifts with everything else in the window. The same input distribution can yield compliance one run and a miss the next. That is not malfunction; it is what sampling from a probabilistic model means.
Why inconsistency is worse than consistent failure
A rule that never works, you route around. A rule that works most of the time is more dangerous, because it earns trust it cannot keep — you stop checking, and the one run in five where it lapses ships. Intermittent compliance is the hardest kind to defend against, and it is exactly what an advisory file delivers. It also poisons debugging: you cannot reproduce the failure on demand, so you cannot tell whether your fix worked or you just got a lucky sample.
Buy back determinism where it counts
You cannot make a probabilistic model deterministic, but you can move the things that must be reliable off the probabilistic path:
Where you can buy back determinism
Advisory rule (CLAUDE.md):
-> probabilistic; varies run to run; no guarantee
Hard guardrail (hook):
-> deterministic on ACTIONS: "never write here", every time
Current fact (index):
-> the same true answer about the code on every query
-> variance removed from the GROUNDING, if not the proseHard constraints belong in hooks, which run every time regardless of sampling. And the facts about your code belong in an index, so that at least the grounding the model reasons from is the same correct answer on every query, even if the prose it generates varies. You will not get deterministic phrasing — but you can stop the model from reasoning off a different, possibly-stale picture of the codebase each run, the variance behind why AI gives different answers about your codebase.
Where Kognita fits
Kognita removes one whole source of run-to-run variance: the grounding. Every session, every agent, queries the same continuously-updated semantic index and gets the same current answer about how the code actually works. The model is still probabilistic, but it is no longer also working from a different, drifting snapshot each time. Pair that with hooks for the hard rules, and you have pushed the unreliable, advisory part of the stack down to where it can only affect phrasing — not whether the agent knows the truth.
Final take
The same CLAUDE.md gives different results because following it was always a probability, not a guarantee — and the probability moves with every change in the surrounding context. Editing the file harder will not fix a property the mechanism never had.
Advisory rules are non-deterministic by nature. Put hard requirements in hooks and codebase truth in an index, so the parts that must be reliable no longer depend on a lucky sample.