KognitaKognita.

Blog

.cursorignore Configuration for Large Codebases: The Tradeoffs Nobody Talks About

8 min read

If you have used Cursor on a large codebase, you have probably been told to configure .cursorignore to exclude files that should not be indexed. The advice is correct — without exclusions, large repos include node_modules, generated files, and build artifacts in the index, all of which degrade retrieval quality. But configuring .cursorignore well is a harder problem than it looks, and the file has a property that any shared config has: it serves the whole team from a single opinionated set of choices.

What .cursorignore does

.cursorignore uses the same syntax as .gitignore and tells Cursor which files and directories to exclude from the codebase index. Files that match the patterns are not chunked, not embedded, and not retrievable during semantic search:

Basic .cursorignore structure
# .cursorignore — what it does
# Patterns excluded from Cursor's codebase index
# Uses .gitignore syntax

# Always exclude (no useful AI context):
node_modules/
.git/
dist/
build/
*.lock
*.map

# Usually exclude (large, auto-generated):
coverage/
.nyc_output/
generated/

# Team-specific tradeoffs:
legacy/          # faster index, but legacy questions break
fixtures/        # cleaner index, but less usage examples
*.test.ts        # smaller index, but no test context

The always-exclude category is clear: node_modules, build output, lock files. These add millions of tokens to the index without providing useful AI context — they are not your code, and Cursor should not retrieve them in response to questions about your system.

Every exclusion creates a blind spot

The harder decisions come in the team-specific category. Every directory you exclude makes indexing faster and retrieval cleaner — and also makes it impossible to answer questions that require content from that directory:

Exclusion blind spots in practice
Excluding "legacy/" creates this gap:
  Developer asks: "How was this handled before the refactor?"
  Cursor answers:  "I don't see anything related to that"
  Reality:         legacy/ has 40,000 lines of prior implementation

  Excluding "*.test.ts" creates this gap:
  Developer asks: "How is this function typically called?"
  Cursor answers:  based on production callers only
  Reality:         test files contain the clearest usage examples

Test files are a particularly common exclusion that creates surprising blind spots. Tests often contain the clearest, most explicit examples of how a function is intended to be called — better usage documentation than comments in many codebases. Excluding them from the index makes Cursor faster and cleaner, and removes the best available examples of correct usage. Whether that tradeoff is worth it depends on how the team writes tests, which most .cursorignore advice does not take into account.

Including too much degrades retrieval

The failure mode of under-excluding is different from over-excluding: instead of missing context, you get diluted context. When the index includes 350,000 node_modules files, semantic queries return vendor library internals with the same confidence as your code:

Retrieval quality without proper exclusions
Without .cursorignore on a large repo:
  Total files:         500,000
  node_modules files:  350,000 (70% of total)
  Generated files:     80,000

  Semantic search retrieves:
    → vendor library internals (not your code)
    → auto-generated type definitions (not authored)
    → build artifacts (not source intent)

  Result: signal-to-noise ratio drops, retrieval degrades

This is the retrieval wall described in AI coding quietly hitting a retrieval wall — the index contains too much content, and the most relevant chunks get outscored by surface-similar irrelevant material from third-party code.

One file, multiple teams, conflicting needs

.cursorignore is a committed file. Every developer on every team works with the same exclusion config. But different teams have different retrieval needs — and those needs are often in direct conflict:

Team-specific indexing needs vs. shared config
One .cursorignore, different team needs:
  Backend engineers:   want infra/ and deployment/ indexed
  Frontend engineers:  want design-system/ and storybook/ indexed
  ML team:             want notebooks/ and model-configs/ indexed
  All three teams:     share one committed .cursorignore

  Outcome: someone's most important context is excluded

There is no good solution within the .cursorignore model. You can discuss as a team and commit a compromise, knowing that someone's most useful context will be excluded. Or you let every developer maintain their own local .cursorignore overrides — which adds configuration maintenance overhead and means team members are working from different index scopes.

The deeper problem: .cursorignore is a symptom

The need to carefully configure what gets indexed reflects a deeper constraint of local indexing: the developer's machine has limited compute and storage, and the index needs to stay manageable within those constraints. Exclusions are how you keep the index usable on a laptop.

Server-side indexing removes this constraint. When the index runs on server infrastructure — not a developer laptop — the compute and storage limitations are different. You can make more nuanced decisions about what to include, run semantic enrichment passes that would be too expensive on a laptop, and tune the index once for the whole team rather than committing a one-size-fits-all exclusion config.

Final take

Configuring .cursorignore is necessary housekeeping for Cursor on any large codebase. The basics — exclude node_modules, build output, generated files — are clear and should be done. The harder decisions about legacy code, test files, and team-specific directories are genuine tradeoffs with no universal right answer.

.cursorignore is a workaround for the constraint of indexing on a laptop. When the index moves server-side, the exclusion problem changes from "what can we afford to include" to "what actually improves retrieval quality" — a much more tractable question.