KognitaKognita.

Blog

Your Team Wants AI Coding Tools. Your Security Team Is Asking Where the Code Goes.

11 min read

The conversation about AI coding tools in most engineering organizations has been about productivity: which tool is fastest, which understands the codebase best, which integrates most cleanly into the existing workflow. A separate conversation — one that security teams are starting to have — is about where the code actually goes during that productivity gain. The answer, for most popular AI coding tools, is: to a third-party server, with every query, in quantities most security teams have not evaluated.

Cursor routes all completion requests through its own cloud infrastructure. Claude Code sends code context to Anthropic's servers. GitHub Copilot transmits code to Microsoft Azure. For each of these tools, the default operating mode is that file contents, function signatures, and codebase structure leave the developer's machine and the organization's network every time the developer asks a question. This is not a hidden behavior — it is how these tools work. It is also something that most security teams have not formally assessed, because adoption happened faster than governance.

For many organizations, this is an acceptable trade-off. The code goes to reputable providers with strong security postures, and the productivity gain is real. For organizations with data residency requirements, proprietary algorithms, or contractual obligations around code confidentiality, the trade-off is not acceptable — and there is currently no way to use these tools while keeping the code inside the network.

Where your code actually goes with popular AI tools

The specifics vary by tool and plan, but the direction is consistent: code leaves the network during AI processing. Understanding exactly what leaves, in what quantity, and to where is the first step toward an honest security assessment.

Where code goes when developers use the most common AI coding tools
Where your code goes when developers use popular AI tools:

  Cursor:
    -> Routes all completion and context requests through Cursor's AWS infrastructure
    -> Code snippets, file contents, and workspace structure sent to third-party servers
    -> No HIPAA BAA available. No FedRAMP certification.
    -> Privacy mode exists but limits functionality significantly

  Claude Code (standard):
    -> Sends code context to Anthropic's servers for every query
    -> HIPAA BAA available only with Zero Data Retention (ZDR) configuration
    -> ZDR requires API usage — not available on Pro or Team plans
    -> Desktop remote mode and web interface: not covered by BAA at all

  GitHub Copilot:
    -> Code transmitted to GitHub/Microsoft Azure for processing
    -> Enterprise plan offers some data residency controls
    -> Code snippets used for suggestions leave the developer's machine regardless

  The default for all three: code leaves your network with every query

The HIPAA BAA situation for Claude Code is particularly important for healthcare-adjacent organizations: the BAA is only available with Zero Data Retention configuration on the API, which requires specific technical setup that most teams using Claude Code are not running. Teams using Claude Code on Pro or Team plans, or using the desktop application's standard mode, are operating outside BAA coverage. For organizations processing code that could touch patient data, this is a compliance gap — not a theoretical one.

What security teams have not evaluated

The adoption rate of AI coding tools outpaced security assessment in most organizations. Developers adopted Cursor and Claude Code as individual choices — personal tools that improved their own productivity. By the time security teams became aware of the scale of adoption, the tools were embedded in daily workflows across the engineering organization.

What is actually leaving your network — at scale
What most security teams haven't evaluated:

  Every developer using Cursor is sending:
    -> Open file contents (often the full file)
    -> Surrounding file context (files in the same directory)
    -> Codebase structure when using workspace features
    -> Function signatures and import paths across the project

  For a 20-developer team running 50 queries each per day:
    -> 1,000 codebase fragments leaving your network daily
    -> Destination: third-party cloud infrastructure
    -> Logging of what was sent: minimal or none
    -> Developer awareness that code is being transmitted: often zero

A 20-developer team running 50 AI queries each per day represents 1,000 codebase fragments leaving the network daily. The fragments include open file contents, surrounding context files, and increasingly — for agentic tools like Claude Code — large portions of the codebase retrieved to answer a single question. The logging of what was transmitted is minimal. The developer's awareness that code is being transmitted is often zero. Most developers think of their AI tool as a smart autocomplete. The actual behavior is closer to a continuous upload of relevant code to a remote server.

For CISOs who have started blocking Cursor and similar tools, the reason is exactly this: the code data flow is real, uncontrolled, and difficult to audit after the fact. Blocking the tool is the only control available in the current tool landscape, because the tools themselves do not offer a network-local operating mode.

The categories of organization most exposed

Not every organization faces the same risk from code leaving the network. For many, the risk is low and the productivity gain dominates. Three categories of organization face material risk that is worth addressing specifically.

Regulated industries. Healthcare teams handling code that could touch PHI, financial services firms with code covering trading logic or customer financial data, legal tech organizations with code touching privileged communications — all of these operate under regulatory frameworks that impose data handling requirements. Code leaving the network to third-party servers creates documentation requirements around vendor risk, data classification, and encryption standards that most AI tool adoption has not satisfied.

Proprietary algorithm businesses. Organizations whose competitive advantage is embodied in specific algorithms — quantitative trading models, pricing engines, recommendation systems, ML models — have a different risk profile. The question is not regulatory compliance but competitive exposure. Sending the core IP of the business through a third-party server is a risk that most of these organizations' legal and security teams would not approve if asked. The approval was never sought because adoption was individual and invisible.

Organizations with contractual code confidentiality obligations. Enterprise software companies with customer data agreements, government contractors with code handling requirements, and any organization whose customer contracts include provisions about code confidentiality may be in violation of those contracts when developer tools transmit code to third-party servers.

Why "just use privacy mode" doesn't solve the problem

The instinctive response from AI tool vendors is to point to privacy mode, enterprise plans, or data processing agreements as the solution. These features reduce risk in specific dimensions. They do not change the fundamental architecture: the code leaves the network for processing, even under enterprise agreements. Privacy mode in Cursor limits functionality significantly while still routing through Cursor's infrastructure. Enterprise plans for GitHub Copilot offer data residency controls but still process code outside the organization's network.

For organizations with hard network boundary requirements — code simply does not leave the network — none of these options satisfies the requirement. The tools are designed for network-connected cloud processing. Organizations that need local or VPC-isolated processing need a different architecture.

The managed codebase intelligence alternative

The architecture that changes the data flow is one where the codebase is indexed server-side — within Kognita's controlled infrastructure — and queries return answers rather than transmitting raw code. The developer's machine sends a question. An answer comes back. The code itself never leaves the repository, never traverses the developer's machine to a third-party API, and never exists on a third-party server in raw form.

The data flow difference between per-developer AI tools and managed codebase intelligence
The managed codebase access model — what stays where:

  Traditional AI tool (Cursor, Claude Code local):
    Developer machine  →  sends raw code  →  AI provider servers
    Your code lives:       on laptops and  →  on third-party infra
                           in the repo         during processing

  Managed codebase intelligence (Kognita):
    Repo (GitHub/GitLab)  →  indexed server-side  →  Kognita infrastructure
    Developer query        →  answer returned      →  no raw code transmitted
    Your code lives:           in your repo only       (semantic index, not code)

This does not eliminate all data sharing — the codebase is indexed by Kognita's infrastructure, which means Kognita has access to a semantic representation of the codebase. What it eliminates is the per-query code transmission from developer machines to AI provider servers. For organizations with data residency requirements, this is a meaningful architectural difference. The code stays in the repository. The answers come from an index, not from raw code processing on a third-party server.

Final take

Most organizations have not had the conversation about where their code goes when developers use AI tools. The conversation is coming — driven by security reviews, compliance audits, and the growing awareness that productivity tools adopted individually have become infrastructure-scale data flows that nobody formally approved.

The answer to "your team wants AI coding tools but your code can't leave the network" is not to block the tools. It is to change the architecture: codebase intelligence that is managed centrally, indexed once, and accessed through a query interface that returns answers rather than transmitting raw code. The productivity gain is preserved. The uncontrolled data flow is not.