Brain
Raw insight traces: claims, caveats, source files, and open questions before they become polished blog posts.
Insight traces
40 / 40 insights
Agent instructions are configuration, not documentation
sources=3 edges=1
AGENTS.md, CLAUDE.md, and Cursor rules should be treated as operational configuration: short, specific, versioned, and tested by observing agent behavior.
topics=[ai-agents, context-engineering, agent-instructions]
Reproducible setup is agent infrastructure
sources=2 edges=1
Agent performance depends on whether the repository can be installed, built, and tested from scratch without tacit human knowledge.
topics=[ai-agents, setup, infrastructure]
Simplicity beats agent theater
sources=1 edges=1
Complex agent orchestration is not a replacement for clear repository structure and deterministic validation.
topics=[ai-agents, architecture, simplicity]
Context should be layered, not dumped
sources=1 edges=1
The best agent context is layered: a small hot path loaded by default, and deeper cold context fetched only when relevant.
topics=[context-engineering, ai-agents, retrieval]
Long context still needs structure
sources=1 edges=1
Large context windows do not eliminate the need for repository retrieval, maps, and scoped documentation.
topics=[context-engineering, retrieval, ai-agents]
Agent-friendly repos help agents not edit
sources=1 edges=1
Good coding agents must sometimes conclude that no code change is required. Repositories should make this defensible.
topics=[ai-agents, verification, no-op]
Task specs are part of the codebase
sources=1 edges=1
For nontrivial work, the task description is a code artifact. It should be scoped, versioned, and verifiable.
topics=[ai-agents, task-specification, planning]
Dependency structure beats text blobs
sources=1 edges=1
AI coding agents perform better when repositories expose dependency structure: imports, calls, type relationships, and build edges.
topics=[retrieval, code-graphs, ai-agents]
Types and static surfaces reduce hallucinated APIs
sources=2 edges=1
Agents need to know which members, functions, and external APIs are actually available. Typed and static-analysis-visible surfaces reduce invalid-code errors.
topics=[types, static-analysis, ai-agents]
More context can hurt
sources=1 edges=1
Large context windows and long instruction files do not automatically improve agent performance. Irrelevant context raises cost and can reduce success.
topics=[context-engineering, ai-agents, retrieval]
Quality gates must cover smells, not just behavior
sources=1 edges=1
Passing behavior tests is not enough. Agent-generated code can be correct-looking while introducing structural quality problems.
topics=[quality-gates, static-analysis, ai-agents]
Generated SDKs turn API contracts into code
sources=2 edges=1
Agents should not hand-roll raw API calls when a typed, generated client can be produced from the API contract.
topics=[apis, generated-sdks, ai-agents]
Repository graphs need selective slices
sources=2 edges=1
Agents benefit from explicit repository structure, but the graph must be queried as a selective slice. Dumping the whole graph into context hurts.
topics=[retrieval, code-graphs, ai-agents]
Feature work fails at planning and constraints
sources=2 edges=1
Feature-addition tasks fail more often at planning, constraint satisfaction, and step fidelity than at localization. Patch application is not resolution.
topics=[ai-agents, feature-work, planning]
Setup is part of the task
sources=2 edges=1
Agents treat setup as the first task. If the environment cannot be bootstrapped deterministically, agents burn reasoning and tokens before touching the actual work.
topics=[ai-agents, setup, verification]
Agentic PRs have a different shape
sources=2 edges=1
Agent-written merged PRs have measurably different commit counts, file touches, additions, deletions, and similarity profiles compared to human PRs.
topics=[ai-agents, code-review, pr-shape]
Static surfaces are agent affordances
sources=2 edges=1
Agents read repos through surfaces: names, imports, types, declarations, generated clients, schemas, tests, and diagnostics. Hidden behavior forces inference.
topics=[static-analysis, apis, ai-agents]
CodeHealth predicts AI refactoring success
sources=2 edges=1
File-level code quality (CodeHealth >= 9) is a statistically significant predictor of LLM refactoring success. Medium-sized models see 15-30% break-rate reduction on healthy code.
topics=[code-quality, refactoring, ai-agents]
Perplexity is not file-level AI-friendliness
sources=2 edges=1
File-level perplexity has no practically meaningful association with code quality. Token-level PPL correlates with human confusion, but file-level PPL does not predict structural quality.
topics=[code-quality, perplexity, ai-agents]
Agents run an orient, retrieve, edit, verify loop
sources=4 edges=6
Most coding-agent failures can be grouped by where the agent loop breaks: orientation, retrieval, editing, or verification.
topics=[ai-agents, codebase-structure, context-engineering]
Recoverable structure beats prompt volume
sources=3 edges=7
Agents need task-relevant structure they can find and trust more than they need larger instruction blobs.
topics=[ai-agents, context-engineering, retrieval]
Names are semantic infrastructure
sources=3 edges=5
Identifier names are not just human style. They are retrieval handles and semantic cues for code models.
topics=[codebase-structure, naming, ai-agents]
Function-sized chunks are not necessarily agent-friendly
sources=2 edges=5
Small functions can be good code, but function-level retrieval chunks underperformed broader or structure-aware chunks in controlled code completion experiments.
topics=[retrieval, chunking, codebase-structure]
Boundaries beat modularity as a slogan
sources=4 edges=5
The evidence does not justify "more modular code" as a blanket claim. Useful boundaries expose intent, dependencies, and verification points.
topics=[architecture, modularity, ai-agents]
Types and generated SDKs compress intent
sources=7 edges=6
Types, schemas, and generated SDKs reduce the number of valid guesses an agent can make at an API boundary.
topics=[types, apis, generated-sdks, ai-agents]
Executable architecture beats prose
sources=8 edges=9
Architecture rules that only exist in prose are easy for agents to miss. Rules that fail precisely are repairable.
topics=[architecture, linting, quality-gates, ai-agents]
Static diagnostics are agent interfaces
sources=4 edges=5
For agents, a diagnostic is a structured repair protocol: rule ID, file, span, evidence, expected surface, and machine-readable output.
topics=[static-analysis, diagnostics, ai-agents, quality-gates]
Static oracles catch what tests miss
sources=4 edges=5
Behavior tests can pass while the code misses structural intent. Static oracles check dependency shape, migration completion, and refactor alignment.
topics=[static-analysis, testing, maintainability, ai-agents]
Static fact models make rules agent-usable
sources=5 edges=5
Agent-useful static analysis needs stable fact families: imports, symbols, types, call graph, CFG, dataflow, tests, coverage, and generated-code metadata.
topics=[static-analysis, facts, code-graphs, ai-agents]
AGENTS.md should be an index, not a novel
sources=6 edges=6
Persistent agent instructions help when they point at executable facts. They hurt when they become stale, long, conflicting, or aspirational.
topics=[agent-instructions, context-engineering, ai-agents]
Tests are context, not just verification
sources=2 edges=5
Durable repo tests teach agents expected behavior and setup. Agent-written tests are often temporary probes and should not be equated with quality.
topics=[testing, verification, ai-agents]
The monorepo is a context database, if it has boundaries
sources=3 edges=6
Putting app code, docs, infra, schemas, SDKs, tests, and runbooks together gives agents atomic context. Without boundaries and tooling, it just makes a larger mess.
topics=[monorepos, context-engineering, architecture]
Voice-agent latency budget is the product
sources=4 edges=5
A real-time voice agent is judged by the whole path from user intent to audible behavior, not by one model benchmark.
topics=[voice-agents, latency, observability]
Endpointing is turn-taking
sources=5 edges=4
VAD answers whether speech is present; endpointing and semantic EOU decide whether the agent is allowed to talk.
topics=[voice-agents, vad, turn-taking]
Streaming STT is not batch STT
sources=3 edges=3
WER is necessary but insufficient for voice agents; live systems need finalization latency, partial stability, EOT quality, and tails.
topics=[voice-agents, stt, benchmarks]
TTS latency is architecture
sources=4 edges=3
Voice-agent TTS should be chosen by TTFA, streaming shape, RTF headroom, interruption behavior, and quality together.
topics=[voice-agents, tts, latency]
Transport is media correctness
sources=4 edges=3
WebRTC wins for browser/mobile voice because it owns media behavior, not because every demo needs lower transport latency.
topics=[voice-agents, webrtc, transport]
Barge-in is the real system test
sources=2 edges=3
A voice agent is not conversational until the user can interrupt it and the system stops, cancels, and preserves state correctly.
topics=[voice-agents, barge-in, conversation]
Native speech models change the boundary
sources=4 edges=3
Native speech-to-speech models attack the STT-LLM-TTS cascade, but cascades remain easier to debug, evaluate, and control.
topics=[voice-agents, native-speech, architecture]
Voice-agent eval needs multiple metrics
sources=2 edges=4
A voice agent must be evaluated across speech accuracy, turn-taking, latency tails, barge-in, transport, cost, and task success.
topics=[voice-agents, evaluation, observability]