KognitaKognita.

Blog

Context+ Needs Ollama Running on Every Developer's Laptop. That's Your Team's New Dependency.

10 min read

"I keep forgetting to start Ollama before my coding session and then wondering why my AI has no codebase context." That sentence shows up in developer forums more than it should. It is not a user error. It is a consequence of how local context MCPs are architected: they depend on a local LLM runtime that has to be running, loaded, and warm before the tool can do anything.

Context+ is a semantic codebase MCP — it uses tree-sitter parsing, spectral clustering, and Obsidian-style linking to build a hierarchical feature graph of your repository. When it works, it works well. But the "when it works" part has a prerequisite that most documentation buries: Ollama has to be running on your machine with specific models downloaded before any of that intelligence is available to your AI coding session.

For an individual developer who has made Ollama a permanent fixture of their laptop setup, that prerequisite is manageable. For a team trying to roll out consistent AI context to a dozen engineers across different machines, operating systems, and working styles, it is the first place the rollout gets stuck.

What Context+ actually requires

The Context+ documentation is transparent about this: it requires Ollama to be running and the appropriate models to be installed before the MCP server can function. Specifically, it needs nomic-embed-text for the embedding layer and a larger generative model — the documentation recommends gemma2:27b — for semantic analysis.

What has to be running before Context+ can provide any codebase context
What has to be running before Context+ can give your AI any context:
  -> Ollama installed and running as a background process
  -> nomic-embed-text model downloaded (embedding layer)
  -> gemma2:27b or equivalent model downloaded (analysis layer)
  -> Context+ MCP server configured in your editor's config file
  -> Repository indexed — which requires Ollama to be up when indexing runs
  -> All of the above repeated on every developer's machine

That is not a simple install. Gemma2:27b is a 17 GB download. Nomic-embed-text adds another 274 MB. Ollama needs to be running as a background service, consuming RAM and GPU resources, any time you want Context+ to be active. On a MacBook Pro with 16 GB of unified memory, running gemma2:27b leaves you with less headroom for your IDE, browser, and other tools than most developers want.

There is also the matter of model versions. Context+ expects specific model versions for predictable behavior. When Ollama updates a model, behavior can shift. When a developer is running a different version than their colleagues, the retrieval they get from Context+ is not the same retrieval their colleagues get. There is no shared state, no synchronized index, no guarantee that two developers querying the same codebase will get contextually consistent answers.

The invisible failure mode

The most dangerous thing about Ollama dependency is how quietly Context+ fails when it is not met. If Ollama is not running, Context+ does not throw a visible error inside your AI coding session. It returns empty context. Your AI assistant continues to respond — it just does so without the semantic codebase layer that was supposed to be enriching it. You are back to raw in-context inference, except you think you have semantic context because you installed the MCP.

What breaks when the Ollama dependency is not met
What breaks when a developer skips any of these steps:
  -> Ollama not running → Context+ silently returns empty context
  -> Wrong model version → degraded retrieval quality, no error message
  -> Stale index → AI gets context from three months ago
  -> New machine → zero context until full setup is repeated
  -> Laptop sleep mid-session → Ollama may hang, context gone until restart

This matters because the whole value proposition of a codebase MCP is that the AI gets grounded answers instead of hallucinated ones. If the groundedness layer is silently absent, you have not reduced the hallucination risk — you have just added a layer of complexity that gives you false confidence that the problem is solved.

Developers who have used Context+ for a while learn to check whether Ollama is running before starting a session. That habit forms because the tool has trained them that it will not tell them when the dependency is broken. It is a workaround baked into daily workflow.

Why this compounds at team scale

When one developer installs Context+, the setup cost is a one-time investment. When a team of fifteen engineers tries to standardize on it, the setup cost multiplies — not just by headcount, but by every incident of machine refresh, OS upgrade, Ollama update, or new hire onboarding. Each one requires going through the same steps: install Ollama, pull the right models, configure the MCP endpoint, verify it is working.

The setup instructions also assume a developer-class machine. A team's mix of hardware varies: some developers are on Apple Silicon with 64 GB unified memory; others are on Windows laptops with 16 GB RAM. The performance of local LLM inference differs significantly across that range. A gemma2:27b model that runs in 3 seconds on an M3 Max takes 45 seconds on a mid-range Intel laptop. That is not a config problem — it is a hardware constraint that local inference cannot solve.

There is also the question of non-developer team members. Product owners, scrum masters, support leads, and engineering managers do not run local repo checkouts. They do not have Ollama installed. They cannot use Context+. The tool is architecturally limited to people who maintain development environments on local machines.

The laptop-as-server problem

Context+ is treating your laptop as the server. The indexing happens on your machine. The model inference happens on your machine. The codebase graph lives on your machine. That is a deliberate architectural choice — it keeps code local, which appeals to developers with strong data sovereignty instincts.

But laptop-as-server has characteristics that are incompatible with team infrastructure. Laptops sleep. Laptops get closed during meetings. Laptops get replaced. Laptops get wiped when developers leave the company or transfer to other teams. When any of those things happen, the local context is gone, stale, or requires full rebuild.

Team infrastructure does not sleep. It does not require re-setup when a developer gets a new MacBook. It does not disappear when someone goes on leave. It is available at the same quality to every team member, on any device, without anything running locally.

What the managed alternative actually looks like

Kognita connects to your repositories at the source — GitHub, GitLab, or Bitbucket — and runs all indexing and semantic enrichment on managed infrastructure. The context it provides is available to any developer through a single MCP endpoint string in their editor configuration. No Ollama. No local model download. No background process to manage.

What a developer needs to start an AI session with Kognita context
What a developer needs to start an AI session with Kognita context:
  -> Editor config: one MCP endpoint string (no local process)
  -> That's it — indexing runs on Kognita's infrastructure, not your laptop

The indexing pipeline Kognita runs is also not bounded by laptop hardware. It uses tree-sitter parsing for structural analysis, large embedding models running on server infrastructure for semantic enrichment, and call graph traversal that is not feasible to run in real-time on a local machine. The result is a semantic layer that is both richer and more consistent than what local inference can produce.

And because the index lives on managed infrastructure rather than individual machines, the same quality of context is available to every developer on the team — including the one who just joined yesterday and has not pulled any repos yet.

When local context makes sense and when it does not

Local tools like Context+ are the right choice when you have a single developer, a single repository, a machine that stays on, and data sovereignty requirements that prevent code from leaving the local environment. For that combination, a local context MCP is a reasonable approach.

But most teams do not fit that profile. They have multiple developers with different hardware. They have multi-repo services where the behavior being traced crosses repository boundaries. They have non-technical team members who need system understanding but cannot install development tools. And they have enough machine churn that per-device setup is a real operational cost, not a one-time investment.

For that combination, a tool that requires Ollama running on every developer's laptop is not a team solution. It is an individual developer tool being applied to a team problem.

Final take

Context+ does real work. The semantic analysis it provides — spectral clustering, call graph linking, AST-based chunking — is meaningfully better than naive file-based context injection. For a developer willing to maintain Ollama on their machine, it is a legitimate improvement over no context MCP at all.

But the Ollama dependency is not a minor configuration detail. It is the architectural core of how the tool works, and it determines who can use it, on what hardware, with what reliability. When Ollama is not running, the tool silently provides nothing. When hardware is inadequate, the tool is too slow to be useful. When team members do not maintain development environments, the tool is inaccessible.

Codebase context should not require a local LLM server to be warm before an AI session can start. The team's understanding of the system should not live on individual laptops. Managed infrastructure exists precisely to eliminate the class of problems that arise when team resources run on personal devices.