Blog

Cursor Sends Your Code to Its Servers for Indexing. Here's What That Means.

9 min read

Cursor's codebase indexing requires generating embeddings for your source code. Embedding generation is computationally intensive, and for most Cursor plans it runs server-side — which means your code chunks leave your local machine and travel to Cursor's infrastructure to be embedded. This is documented in Cursor's security pages and has been the subject of engineering security reviews at many companies, particularly those with regulated IP or compliance requirements. It is not a secret, but it is often not fully understood until a CISO asks.

What happens during Cursor codebase indexing

The embedding pipeline requires sending code chunks to an embedding model. Cursor has published documentation explaining that file chunks are chunked locally and sent to their servers for embedding generation. The resulting vectors are stored locally on your machine — but the source code text leaves your network in the process:

What leaves your network during Cursor indexing

What Cursor sends to its servers during indexing:
  → Code chunks from your local files (for embedding generation)
  → File contents during active chat sessions (for model context)
  → Query text and retrieved context (for inference)

What stays local:
  → The vector index itself (stored on your machine)
  → The embedding vectors (after generation)

Note: Cursor has published a "privacy mode" / secure indexing option
that processes chunks locally — but this requires specific configuration
and is not the default for all plans.

Cursor has a "Privacy Mode" option that is designed to prevent code from being used for training — and for some plans, an option to process embeddings more locally. The specifics depend on your Cursor plan and whether Privacy Mode is configured. But understanding exactly what happens requires reading Cursor's current documentation and data processing agreements, which change as the product evolves. This is the core concern behind why CISOs block Cursor at the enterprise level.

Why this matters in regulated industries

For most developer tooling, the question of whether code chunks leave the network is a philosophical one — you are already trusting GitHub, npm, and your CI provider with your source. But some industries have specific legal or contractual reasons to be cautious about where source code travels:

Industries with heightened code-egress sensitivity

Industries where code leaving the network is problematic:
  Financial services:   proprietary trading algorithms, risk models
  Healthcare:           code that processes PHI, HIPAA scope questions
  Defense / gov:        classified system logic, FedRAMP scope
  Legal tech:           client data handling code, privilege questions
  Fintech startups:     PCI DSS scope, payment processing logic

In financial services, proprietary trading logic is often the core competitive asset — having it pass through a third-party embedding service requires understanding the data handling guarantees precisely. In healthcare, code that processes PHI may trigger HIPAA scope questions depending on what is represented in the code. In defense contracting, even describing system behavior in source code may have classification implications.

The questions security teams ask

When a security team evaluates Cursor for an enterprise deployment, the indexing data flow is typically the first point of concern. These questions come up consistently in procurement reviews:

Security review questions for Cursor adoption

Questions security teams ask before approving Cursor:
  "Does source code leave our network?"
  "Where are the embeddings stored and for how long?"
  "Can Cursor staff or contractors access our code?"
  "Is the embedding service SOC 2 Type II certified?"
  "What data is retained from chat sessions?"
  "Does Privacy Mode actually prevent code transmission?"
  "Is there an enterprise option with dedicated compute?"
  "What happens to our data if we cancel the subscription?"

These are not unreasonable questions, and Cursor has answered many of them — but the answers depend on which plan is in use, whether Privacy Mode is configured, and the current terms of service. For teams that have gone through this review, the process typically takes weeks and sometimes results in restrictions on which repositories can be indexed.

The enterprise-managed indexing alternative

The security concern with cloud-based embedding is not that it is always wrong — it is that it creates a data flow that requires active management: contracts, configuration, and ongoing monitoring to ensure compliance. For some organizations, the right answer is a managed indexing platform with clearer data boundaries and predictable behavior at the infrastructure level:

Kognita's indexing security model

Kognita's indexing security model:
  → Connects via Bitbucket / GitHub OAuth (read-only)
  → Indexing runs server-side in isolated per-org compute
  → Vectors stored in dedicated per-org Qdrant collection
  → No code stored in plain text after indexing
  → Access gated by org membership, not per-developer accounts

The principle is the same as any other cloud infrastructure security decision: you are not eliminating cloud compute, you are choosing a compute model where the data flow is well-defined, the isolation is documented, and the access control model matches your security posture.

The practical decision

For most startups and mid-size engineering teams, the security concern with Cursor's indexing is manageable — read the documentation, configure Privacy Mode if needed, and proceed. The productivity benefits are real and the risk is limited. The calculus changes in larger enterprises, regulated industries, or organizations where proprietary code represents the primary competitive moat.

In those contexts, the indexing data flow is not a minor configuration detail — it is a first-class procurement consideration that determines whether the tool can be deployed at all. And the answer to that question requires more than checking a Privacy Mode checkbox.

Final take

Cursor's indexing works by generating embeddings server-side, which means code chunks leave your local machine during the indexing process. This is a known and documented aspect of how the tool works, and Cursor provides options — Privacy Mode, enterprise plans — to address various security requirements. For most teams, this is manageable. For regulated industries and large enterprises with strict IP protection requirements, it is a procurement-level question that deserves careful review.

Understanding exactly what your codebase indexing tool sends to external servers is not paranoia — it is a basic due-diligence question that every engineering organization should be able to answer.