Blog
Codebases Are Graphs of Logical Units, Not Files or Functions
11 min read
Most tooling still treats repositories as collections of files, classes, methods, and folders. That representation breaks in modern systems — because the real unit of meaning is usually a logical unit of functionality, often spanning controllers, services, queries, queues, events, infrastructure, workflows, APIs, and background jobs. This is the same “graphs not documents” thesis as why flat context is insufficient, and it is why naive chunking destroys workflows.
The file system is not the real architecture
A repository tree may look organized:
controllers/
services/
repositories/
workers/
events/Operationally, the real system may look like a cross-cutting flow. Humans eventually reconstruct these relationships mentally; most AI systems do not unless retrieval gives them the graph.
Failed Payment Recovery
→ webhook consumer
→ payment validator
→ retry scheduler
→ Stripe recovery worker
→ notification workflow
→ reconciliation pipelineA function rarely represents the full behavior
retryFailedPayment(paymentId) looks like one function — but operationally it may involve state queries, metadata loads, retry windows, events, locks, external APIs, audit logs, and downstream workflows. The logical unit is the operational flow, not the local definition. Retrieval that returns only the function body yields local syntax without operational behavior — a recipe for the failures in agent hallucination patterns.
Example: user deletion
A question like “what happens when a user deletes their account?” may retrieve deleteUser(userId) — while the actual logical unit spans billing, storage, compliance, analytics, search, and recommendations:
Delete Account Flow
→ revoke sessions
→ cancel billing
→ purge S3 assets
→ enqueue GDPR cleanup
→ notify analytics
→ invalidate search indexes
→ remove recommendationsLogical units are cross-cutting
Meaningful nodes are often cross-service, cross-file, cross-layer, and cross-runtime. “Customer onboarding” may span frontend forms, APIs, fraud checks, email systems, workers, analytics, feature flags, billing setup, and CRM sync. Treating that as disconnected methods destroys operational meaning.
Why this matters for debugging
Humans debug through workflows and execution paths. Example: “customers are not receiving payment recovery emails” may trace through:
Stripe webhook
→ failed payment event
→ retry scheduler
→ recovery worker
→ notification pipeline
→ email queue
→ SMTP providerAI debugging improves dramatically when retrieval reconstructs the full logical unit instead of a single helper like sendRecoveryEmail(...).
Traditional chunking breaks functional meaning
Fixed token splits and file boundaries rarely align with operational behavior. A single workflow may span dozens of disconnected chunks — the AI receives fragmented syntax instead of coherent functionality. That is the chunking cost we outline in bad chunking.
Logical units are behavioral nodes
Instead of node = class, think node = operational capability — payment recovery, onboarding, fraud detection, session revocation, order fulfillment. Each may span repositories, queues, services, APIs, infra, and workflows. The graph becomes behavioral, not syntactic.
Example: “why did this order fail?”
Traditional retrieval may surface services and policies. A graph-aware view can reconstruct an order placement flow:
Order Placement Flow
→ cart validation
→ inventory reservation
→ fraud analysis
→ payment authorization
→ order persistence
→ event publication
→ fulfillment pipelineThat enables tracing failures, bottlenecks, and downstream dependencies — debugging as graph reconstruction, not local inspection.
This is where Kognita fits
Kognita exists because repositories are too behaviorally complex for raw file-level retrieval. The goal is to reconstruct logical workflows, operational relationships, execution paths, dependency graphs, and behavioral units — so AI sees connected behavior instead of isolated syntax.
Final takeaway
Modern repositories are networks of logical operational capabilities — workflows and business capabilities — not collections of methods. Systems that understand repositories as disconnected text will continue to struggle with debugging, reasoning, and architecture. The systems that work best will reconstruct functional graphs instead of isolated fragments — because software is behavioral networks disguised as source code.