KognitaKognita.

Blog

Context+ Uses Ollama to Run Local Models. That's the Quality Ceiling.

10 min read

Context+ is genuinely ambitious in its architecture. It uses spectral clustering, tree-sitter AST parsing, and Obsidian-style linking to build a hierarchical feature graph of a codebase. The technique is sound. But there is a constraint that no amount of clever architecture can fully escape: the embedding and inference model that powers Context+'s semantic analysis has to run on Ollama, on your laptop.

That constraint matters more than the documentation suggests. The quality of semantic codebase retrieval is not just a function of the retrieval algorithm — it is significantly a function of the embedding model. And the embedding model you can run on a developer laptop has a hard ceiling set by available local hardware.

Why embedding model size matters for retrieval quality

Embedding models convert text — code, function names, comments, behavioral descriptions — into vectors. Retrieval works by finding the vectors in the index that are most similar to the query vector. The precision of that similarity search is bounded by the richness of the embedding space: how well the model captures semantic relationships between concepts.

Larger models produce richer embedding spaces. A query about "payment retry logic on card expiry" will have a more precise vector in a model trained with 3-billion parameters than in a model trained with 300 million parameters. The larger model has more capacity to capture the distinction between "retry logic" in a billing context versus "retry logic" in a network request context, and to recognize that "card expiry" semantically overlaps with "subscription dunning."

Context+ recommends nomic-embed-text as the embedding layer. nomic-embed-text is a competent model and runs efficiently on consumer hardware. It is not in the same embedding quality tier as the large proprietary embedding models that run on GPU clusters. The quality ceiling is set by what local hardware can serve in real-time.

What bounds the quality of local model-based codebase context
What bounds the quality of local model-based codebase context:
  -> Model size: laptop hardware limits you to 7B–27B parameter models
  -> Inference speed: large models run slowly on consumer hardware
  -> Context length: local models often have shorter effective context windows
  -> Embedding quality: smaller models produce lower-dimensional, less precise embeddings
  -> Batch throughput: local inference can't match GPU cluster throughput for indexing
  -> Consistency: model behavior can vary across Ollama versions and hardware

The inference model is also constrained. Context+'s semantic analysis layer uses a generative model through Ollama — the documentation recommends gemma2:27b for analysis tasks. Running gemma2:27b on a MacBook Pro with 24 GB unified memory is feasible. Running it on a Windows laptop with 16 GB RAM is not. Teams with mixed hardware get inconsistent analysis quality as a side effect of hardware variation, not model variation.

The retrieval pipeline beyond embeddings

Good codebase retrieval is not just a nearest-neighbor search on embedding vectors. The most accurate semantic retrieval of code behavior requires several layers: AST-based structural analysis to understand how the code is organized, not just what symbols appear in it; call graph traversal to understand which code is invoked from where; dependency analysis to understand which services or modules a piece of logic depends on; and behavioral abstraction to describe what the code does in domain terms rather than syntactic terms.

Context+ implements AST parsing through tree-sitter and builds a call graph from it. This is the right approach. The limitation is throughput and scope: running tree-sitter analysis and call graph traversal on a full production codebase locally is slow enough that it is done once at index time, not continuously. That means the index reflects whatever the codebase looked like when indexing last ran — which may be days or weeks old if the developer has not explicitly re-indexed.

What server-side indexing unlocks for semantic retrieval quality
What server-side indexing unlocks for semantic retrieval:
  -> Large embedding models (e.g. text-embedding-3-large) for higher precision
  -> Tree-sitter AST parsing across the full repository, not just open files
  -> Call graph traversal across all connected repositories simultaneously
  -> Semantic enrichment with large generative models, not just embeddings
  -> Cross-language understanding: Python service calling a Go service via gRPC
  -> Batch re-indexing on every merge to main — no stale context
  -> No inference latency from laptop hardware — retrieval is fast regardless of machine

Server-side indexing changes the throughput constraint. When indexing runs on cluster infrastructure rather than a developer laptop, it can run after every merge to main, not just when a developer remembers to re-index. The index is always current. The call graph reflects this week's architecture, not last month's. When a service dependency was added in Tuesday's commit, Wednesday's AI session knows about it.

The cross-repository problem

Context+ indexes the repository on your machine. In a microservices environment, that is one repository out of many. When a developer asks about the subscription retry logic, the relevant code may span a billing service, a payment gateway client, a notification service, and a shared library. Each of those is a separate repository. Context+ can only provide context on the repository it has indexed locally — typically the one the developer is currently working in.

Server-side indexing can index all connected repositories simultaneously and build a unified semantic graph that understands cross-service relationships. When the billing service calls the notification service, the index knows both sides of that call — the invocation in billing and the handler in notification — because both repositories are indexed in the same pipeline.

The retrieval quality difference is most visible in exactly the questions that matter most: "what happens when X fails," "which services need to be updated if we change Y," "where does Z behavior originate." These questions require cross-service understanding that a single-repo local index fundamentally cannot provide.

A concrete example: subscription retry behavior

The same codebase question answered with local vs server-side context
Example: asking "how does the subscription retry logic behave on card expiry?"

  With local model context (Context+):
  -> nomic-embed-text or similar finds files with "subscription" and "retry"
  -> Returns 3-5 most semantically similar code chunks
  -> Works well if everything is in one repository and well-named
  -> Struggles with behavior split across services or abstracted behind interfaces

  With server-side semantic context (Kognita):
  -> Full AST parsing identifies the retry state machine
  -> Call graph traces the path from billing service into payment gateway
  -> Cross-repo links include the notification service that fires on expiry
  -> Behavioral description: "Retries 3x at 24h intervals, then sends dunning email via notification-service"
  -> Returns behavior description plus the specific functions involved, across repos

The difference in that example is not just completeness — it is accuracy. The local model answer is not wrong given its constraints, but it is incomplete in a way that can mislead. An AI coding assistant told that "retry logic handles card expiry" without knowing about the notification service may suggest changes to the billing service retry logic without understanding that the dunning email sequence also needs updating. That omission gets caught in code review at best, in production at worst.

When local model quality is good enough

Local model-based codebase context is meaningfully better than no codebase context. For a developer working alone on a single-service repository, Context+ with Ollama provides real retrieval improvement over relying on the model's in-context inference from open files. The embedding quality is sufficient for common case retrieval. The AST analysis adds structure that keyword search misses.

The quality ceiling matters when the questions get complex: when the behavior spans multiple services, when the architecture has evolved faster than the local index refreshed, when the retrieval needs to understand domain semantics rather than surface-level keyword matching. That is the category of questions where AI coding tools still most commonly fail, and it is the category where the embedding model quality gap between local and server-side is largest.

Final take

Context+ is built on good retrieval primitives. Spectral clustering, tree-sitter parsing, and graph-based linking are the right ingredients for semantic codebase understanding. The constraint is not the approach — it is the platform. Local hardware sets a ceiling on embedding model size, inference throughput, index freshness, and cross-repository scope that no algorithmic improvement can fully override.

Server-side indexing removes that ceiling. Not because it uses different algorithms, but because it removes the hardware constraint and enables the kind of continuous, cross-repository, large-model semantic indexing that produces materially better retrieval on exactly the hard questions developers most need AI to answer correctly.

The difference between local model codebase context and server-side semantic context is not a preference. It is a quality ceiling set by hardware. When the questions get complex — cross-service traces, recent architectural changes, behavioral abstractions — the hardware ceiling is where AI coding accuracy hits a wall.