Blog

Codebases Are Graphs of Logical Units, Not Files or Functions

11 min read

Most tooling still treats repositories as collections of files, classes, methods, and folders. That representation breaks in modern systems — because the real unit of meaning is usually a logical unit of functionality, often spanning controllers, services, queries, queues, events, infrastructure, workflows, APIs, and background jobs. This is the same “graphs not documents” thesis as why flat context is insufficient, and it is why naive chunking destroys workflows.

The file system is not the real architecture

A repository tree may look organized:

controllers/
services/
repositories/
workers/
events/

Operationally, the real system may look like a cross-cutting flow. Humans eventually reconstruct these relationships mentally; most AI systems do not unless retrieval gives them the graph.

The real functional unit (illustrative)

Failed Payment Recovery
  → webhook consumer
  → payment validator
  → retry scheduler
  → Stripe recovery worker
  → notification workflow
  → reconciliation pipeline

A function rarely represents the full behavior

retryFailedPayment(paymentId) looks like one function — but operationally it may involve state queries, metadata loads, retry windows, events, locks, external APIs, audit logs, and downstream workflows. The logical unit is the operational flow, not the local definition. Retrieval that returns only the function body yields local syntax without operational behavior — a recipe for the failures in agent hallucination patterns.

Example: user deletion

A question like “what happens when a user deletes their account?” may retrieve deleteUser(userId) — while the actual logical unit spans billing, storage, compliance, analytics, search, and recommendations:

No single class owns the whole story

Delete Account Flow
  → revoke sessions
  → cancel billing
  → purge S3 assets
  → enqueue GDPR cleanup
  → notify analytics
  → invalidate search indexes
  → remove recommendations

Logical units are cross-cutting

Meaningful nodes are often cross-service, cross-file, cross-layer, and cross-runtime. “Customer onboarding” may span frontend forms, APIs, fraud checks, email systems, workers, analytics, feature flags, billing setup, and CRM sync. Treating that as disconnected methods destroys operational meaning.

Why this matters for debugging

Humans debug through workflows and execution paths. Example: “customers are not receiving payment recovery emails” may trace through:

Behavioral graph, not one file

Stripe webhook
  → failed payment event
  → retry scheduler
  → recovery worker
  → notification pipeline
  → email queue
  → SMTP provider

AI debugging improves dramatically when retrieval reconstructs the full logical unit instead of a single helper like sendRecoveryEmail(...).

Traditional chunking breaks functional meaning

Fixed token splits and file boundaries rarely align with operational behavior. A single workflow may span dozens of disconnected chunks — the AI receives fragmented syntax instead of coherent functionality. That is the chunking cost we outline in bad chunking.

Logical units are behavioral nodes

Instead of node = class, think node = operational capability — payment recovery, onboarding, fraud detection, session revocation, order fulfillment. Each may span repositories, queues, services, APIs, infra, and workflows. The graph becomes behavioral, not syntactic.

Example: “why did this order fail?”

Traditional retrieval may surface services and policies. A graph-aware view can reconstruct an order placement flow:

Order Placement Flow
  → cart validation
  → inventory reservation
  → fraud analysis
  → payment authorization
  → order persistence
  → event publication
  → fulfillment pipeline

That enables tracing failures, bottlenecks, and downstream dependencies — debugging as graph reconstruction, not local inspection.

This is where Kognita fits

Kognita exists because repositories are too behaviorally complex for raw file-level retrieval. The goal is to reconstruct logical workflows, operational relationships, execution paths, dependency graphs, and behavioral units — so AI sees connected behavior instead of isolated syntax.

Final takeaway

Modern repositories are networks of logical operational capabilities — workflows and business capabilities — not collections of methods. Systems that understand repositories as disconnected text will continue to struggle with debugging, reasoning, and architecture. The systems that work best will reconstruct functional graphs instead of isolated fragments — because software is behavioral networks disguised as source code.