Brain

Raw insight traces: claims, caveats, source files, and open questions before they become polished blog posts.

Insight traces

40 / 40 insights

Agent instructions are configuration, not documentation

sources=3 edges=1

AGENTS.md, CLAUDE.md, and Cursor rules should be treated as operational configuration: short, specific, versioned, and tested by observing agent behavior.

topics=[ai-agents, context-engineering, agent-instructions]

Reproducible setup is agent infrastructure

sources=2 edges=1

Agent performance depends on whether the repository can be installed, built, and tested from scratch without tacit human knowledge.

topics=[ai-agents, setup, infrastructure]

Simplicity beats agent theater

sources=1 edges=1

Complex agent orchestration is not a replacement for clear repository structure and deterministic validation.

topics=[ai-agents, architecture, simplicity]

Context should be layered, not dumped

sources=1 edges=1

The best agent context is layered: a small hot path loaded by default, and deeper cold context fetched only when relevant.

topics=[context-engineering, ai-agents, retrieval]

Long context still needs structure

sources=1 edges=1

Large context windows do not eliminate the need for repository retrieval, maps, and scoped documentation.

topics=[context-engineering, retrieval, ai-agents]

Agent-friendly repos help agents not edit

sources=1 edges=1

Good coding agents must sometimes conclude that no code change is required. Repositories should make this defensible.

topics=[ai-agents, verification, no-op]

Task specs are part of the codebase

sources=1 edges=1

For nontrivial work, the task description is a code artifact. It should be scoped, versioned, and verifiable.

topics=[ai-agents, task-specification, planning]

Dependency structure beats text blobs

sources=1 edges=1

AI coding agents perform better when repositories expose dependency structure: imports, calls, type relationships, and build edges.

topics=[retrieval, code-graphs, ai-agents]

Types and static surfaces reduce hallucinated APIs

sources=2 edges=1

Agents need to know which members, functions, and external APIs are actually available. Typed and static-analysis-visible surfaces reduce invalid-code errors.

topics=[types, static-analysis, ai-agents]

More context can hurt

sources=1 edges=1

Large context windows and long instruction files do not automatically improve agent performance. Irrelevant context raises cost and can reduce success.

topics=[context-engineering, ai-agents, retrieval]

Quality gates must cover smells, not just behavior

sources=1 edges=1

Passing behavior tests is not enough. Agent-generated code can be correct-looking while introducing structural quality problems.

topics=[quality-gates, static-analysis, ai-agents]

Generated SDKs turn API contracts into code

sources=2 edges=1

Agents should not hand-roll raw API calls when a typed, generated client can be produced from the API contract.

topics=[apis, generated-sdks, ai-agents]

Repository graphs need selective slices

sources=2 edges=1

Agents benefit from explicit repository structure, but the graph must be queried as a selective slice. Dumping the whole graph into context hurts.

topics=[retrieval, code-graphs, ai-agents]

Feature work fails at planning and constraints

sources=2 edges=1

Feature-addition tasks fail more often at planning, constraint satisfaction, and step fidelity than at localization. Patch application is not resolution.

topics=[ai-agents, feature-work, planning]

Setup is part of the task

sources=2 edges=1

Agents treat setup as the first task. If the environment cannot be bootstrapped deterministically, agents burn reasoning and tokens before touching the actual work.

topics=[ai-agents, setup, verification]

Agentic PRs have a different shape

sources=2 edges=1

Agent-written merged PRs have measurably different commit counts, file touches, additions, deletions, and similarity profiles compared to human PRs.

topics=[ai-agents, code-review, pr-shape]

Static surfaces are agent affordances

sources=2 edges=1

Agents read repos through surfaces: names, imports, types, declarations, generated clients, schemas, tests, and diagnostics. Hidden behavior forces inference.

topics=[static-analysis, apis, ai-agents]

CodeHealth predicts AI refactoring success

sources=2 edges=1

File-level code quality (CodeHealth >= 9) is a statistically significant predictor of LLM refactoring success. Medium-sized models see 15-30% break-rate reduction on healthy code.

topics=[code-quality, refactoring, ai-agents]

Perplexity is not file-level AI-friendliness

sources=2 edges=1

File-level perplexity has no practically meaningful association with code quality. Token-level PPL correlates with human confusion, but file-level PPL does not predict structural quality.

topics=[code-quality, perplexity, ai-agents]

Agents run an orient, retrieve, edit, verify loop

sources=4 edges=6

Most coding-agent failures can be grouped by where the agent loop breaks: orientation, retrieval, editing, or verification.

topics=[ai-agents, codebase-structure, context-engineering]

Recoverable structure beats prompt volume

sources=3 edges=7

Agents need task-relevant structure they can find and trust more than they need larger instruction blobs.

topics=[ai-agents, context-engineering, retrieval]

Names are semantic infrastructure

sources=3 edges=5

Identifier names are not just human style. They are retrieval handles and semantic cues for code models.

topics=[codebase-structure, naming, ai-agents]

Function-sized chunks are not necessarily agent-friendly

sources=2 edges=5

Small functions can be good code, but function-level retrieval chunks underperformed broader or structure-aware chunks in controlled code completion experiments.

topics=[retrieval, chunking, codebase-structure]

Boundaries beat modularity as a slogan

sources=4 edges=5

The evidence does not justify "more modular code" as a blanket claim. Useful boundaries expose intent, dependencies, and verification points.

topics=[architecture, modularity, ai-agents]

Types and generated SDKs compress intent

sources=7 edges=6

Types, schemas, and generated SDKs reduce the number of valid guesses an agent can make at an API boundary.

topics=[types, apis, generated-sdks, ai-agents]

Executable architecture beats prose

sources=8 edges=9

Architecture rules that only exist in prose are easy for agents to miss. Rules that fail precisely are repairable.

topics=[architecture, linting, quality-gates, ai-agents]

Static diagnostics are agent interfaces

sources=4 edges=5

For agents, a diagnostic is a structured repair protocol: rule ID, file, span, evidence, expected surface, and machine-readable output.

topics=[static-analysis, diagnostics, ai-agents, quality-gates]

Static oracles catch what tests miss

sources=4 edges=5

Behavior tests can pass while the code misses structural intent. Static oracles check dependency shape, migration completion, and refactor alignment.

topics=[static-analysis, testing, maintainability, ai-agents]

Static fact models make rules agent-usable

sources=5 edges=5

Agent-useful static analysis needs stable fact families: imports, symbols, types, call graph, CFG, dataflow, tests, coverage, and generated-code metadata.

topics=[static-analysis, facts, code-graphs, ai-agents]

AGENTS.md should be an index, not a novel

sources=6 edges=6

Persistent agent instructions help when they point at executable facts. They hurt when they become stale, long, conflicting, or aspirational.

topics=[agent-instructions, context-engineering, ai-agents]

Tests are context, not just verification

sources=2 edges=5

Durable repo tests teach agents expected behavior and setup. Agent-written tests are often temporary probes and should not be equated with quality.

topics=[testing, verification, ai-agents]

The monorepo is a context database, if it has boundaries

sources=3 edges=6

Putting app code, docs, infra, schemas, SDKs, tests, and runbooks together gives agents atomic context. Without boundaries and tooling, it just makes a larger mess.

topics=[monorepos, context-engineering, architecture]

Voice-agent latency budget is the product

sources=4 edges=5

A real-time voice agent is judged by the whole path from user intent to audible behavior, not by one model benchmark.

topics=[voice-agents, latency, observability]

Endpointing is turn-taking

sources=5 edges=4

VAD answers whether speech is present; endpointing and semantic EOU decide whether the agent is allowed to talk.

topics=[voice-agents, vad, turn-taking]

Streaming STT is not batch STT

sources=3 edges=3

WER is necessary but insufficient for voice agents; live systems need finalization latency, partial stability, EOT quality, and tails.

topics=[voice-agents, stt, benchmarks]

TTS latency is architecture

sources=4 edges=3

Voice-agent TTS should be chosen by TTFA, streaming shape, RTF headroom, interruption behavior, and quality together.

topics=[voice-agents, tts, latency]

Transport is media correctness

sources=4 edges=3

WebRTC wins for browser/mobile voice because it owns media behavior, not because every demo needs lower transport latency.

topics=[voice-agents, webrtc, transport]

Barge-in is the real system test

sources=2 edges=3

A voice agent is not conversational until the user can interrupt it and the system stops, cancels, and preserves state correctly.

topics=[voice-agents, barge-in, conversation]

Native speech models change the boundary

sources=4 edges=3

Native speech-to-speech models attack the STT-LLM-TTS cascade, but cascades remain easier to debug, evaluate, and control.

topics=[voice-agents, native-speech, architecture]

Voice-agent eval needs multiple metrics

sources=2 edges=4

A voice agent must be evaluated across speech accuracy, turn-taking, latency tails, barge-in, transport, cost, and task success.

topics=[voice-agents, evaluation, observability]