Diffusion Deep Research

Remember back in school when you had one of those infamous and dreaded group projects (I kinda liked them)...

At least a few times you probably tried the “parallel” way of working, optimizing for a bit less collaboration and each participant owning one segment of the report. Off you go! Each person for themselves writing extensive backgrounds, history, theory, or whatever segments you decided on. Then you meet up 3 hours before the deadline to “glue the report” together—how did that turn out?

The result was probably:

Repetitive
Inconsistent
Different tone of voice per segment
Vastly different quality per segment
Not the grade you hoped for

It turns out, when we construct our AI research agents like this (plan -> parallel research -> glue research into report), we get the same problem! When no context of the “evolving report” is shared across sub-agents, we get a fragmented ball of mud.

These sequential and isolated group projects/research agents have their perks, like high level of autonomy and parallelism... but there are probably better ways to do it.

Diffusion Deep Research

Think of diffusion agent models like brainstorming, but instead of everyone writing their own part in isolation and building a Frankenstein report, the research spreads and overlaps as it evolves within the team. Ideas for each segment are not isolated, as not one person owns each segment.

The team starts off by writing a draft, only based on their internal knowledge. Typically in bullet point format with clear notes about missing references, knowledge gaps, outdated information, and uncertainty.

The students prioritize these knowledge gaps together and research different perspectives of those gaps (in parallel isolation) and add them back to the report in an iterative manner as a group. Gradually the draft evolves into an iteratively better report, filling gaps and enriching knowledge. The draft grows to become the final report. In each writing step, the students have a clear process for transforming rich knowledge into a concise report that fits into the whole story they are trying to tell.

To me, this makes a lot more sense! I'll explore the implementation details of text diffusion in this blog post. Enjoy!

Why diffusion for research?

The problem with single-pass research

Traditional AI research agents follow a linear paradigm: Query → Search → Synthesize → Report. This suffers from fundamental limitations:

Information Loss: Important context discovered late cannot influence earlier decisions.
No Self-Correction: Errors or gaps in early research propagate to the final output.
Static Search Strategy: The search strategy is fixed at the start and cannot adapt.
Coherence Degradation: Long reports lose coherence when sections are generated independently.

The diffusion paradigm

Diffusion models, originally developed for image generation, provide an elegant solution. Instead of generating content in one pass, they start with a noisy initial state (random noise for images, rough draft for research) and iteratively refine through multiple denoising steps, using guidance signals to steer the refinement.

“The iterative nature of diffusion models naturally mirrors how humans actually conduct research—cycles of searching, reasoning, and revision.”
— Google Research, Deep Researcher with Test-Time Diffusion, 2025

Diffusion Overview

4-phase pipeline

Brief Generation

Expands the user query into a structured research brief with sources, constraints, and scope.▌

Initial Draft

Creates a noisy draft from model knowledge only—no external facts yet, just structure and placeholde…

Diffusion Loop

Identify Gaps → ask research questions

Final Report

Apply Insightfulness + Helpfulness rules, clean citations, and finalize into a benchmark-ready repor…

Core architecture: four phases

The implementation consists of four primary phases, orchestrated through a state machine:

Phase 1: Research brief generation

Transform the user query into a detailed research brief with sources, constraints, and scope. This ensures all downstream research is grounded in explicit requirements.

Phase 2: Initial draft generation

Generate a draft from the LLM's internal knowledge only—no external information retrieval yet. This is the “noisy” initial state that provides structure to guide subsequent research. It may contain outdated or incomplete information, and that's intentional.

Phase 3: Diffusion loop (supervisor subgraph)

The core innovation. Each iteration follows four steps:

Generate research questions to address gaps in the draft
Conduct Research: Retrieve external info for “denoising”
Refine Draft: Remove “noise” (imprecision, incompleteness) from draft
Assess: Are findings comprehensive? (NOT draft appearance, readability vs correctness)

Phase 4: Final report generation

Apply quality optimization with Insightfulness + Helpfulness rules. Deduplicate findings by URL, add granular breakdowns, detailed mapping tables, nuanced discussion, and proper citations.

Core algorithm overview

The core innovation is the Self-Balancing Test-Time Diffusion algorithm, encoded directly in the supervisor's system prompt. Here is the exact algorithm from the Go implementation:

The full diffusion algorithm prompt is available as a collapsible block in the code walkthrough below.

Diffusion Loop

One iteration (practical walkthrough)

Step 1: Generate research questions

Tool: think. Identify draft gaps and propose diverse research questions tied to those gaps.

think:
  reflection: |
    Uptime claims are vague; need Cloudflare outage history and SLA terms (2023–2025).
    Compare public incident reports vs. status page claims.

Expected: 3–5 targeted questions, each mapped to a draft gap with scope/priority notes.

Step 2: ConductResearch (parallel)

Tool: ConductResearch. Delegate distinct questions to sub-agents with explicit instructions and expected returns.

ConductResearch:
  research_topic: |
    Collect primary sources on Cloudflare outages and SLA/uptime guarantees (2023–2025).
    Return: URLs, outage timelines, SLA terms, and any compensations offered.

Expected: cited findings (URLs + quotes) per sub-agent, deduped URLs, short summaries.

Step 3: refine_draft_report

Tool: refine_draft_report. Fold new findings into the draft; keep structure concise to conserve context.

refine_draft_report:
  research_brief: "<brief>"
  findings: "<citations + quotes>"
  draft_report: "<current draft>"

Expected: draft updated with citations/quotes; bullets or short paragraphs retained for clarity and context efficiency.

Step 4: Assess completeness

Heuristic: diverse new searches should stop yielding new facts. If not, loop again.

Checklist:
- New queries tried? (global + section-specific)
- Any new sources or facts? If yes, continue loop
- If no, call ResearchComplete

Expected: a clear decision to continue or call ResearchComplete, with rationale noted.

Theoretical foundations

Classical diffusion models

In classical diffusion models (DDPM, Stable Diffusion), the process consists of two phases:

Forward Diffusion: Gradually add noise to data: x₀ → x₁ → x₂ → ... → xₜ (pure noise)

Reverse Diffusion: Learn to denoise step by step: xₜ → xₜ₋₁ → ... → x₁ → x₀ (clean data)

For the readers that have walked the fields of Machine Learning, this feels like an autoencoder but that goes to complete noise instead of a low-dimensional latent space representation (that still actually means something). With key differences of course.. (for another blog post)

Adaptation to research

For research report generation, we reinterpret this process:

Classical Diffusion	Research Diffusion
Random noise (xₜ)	Initial draft from model knowledge
Denoising step	Research iteration + draft refinement
Guidance signal	Retrieved information from web search
Clean output (x₀)	Comprehensive, accurate research report

The key insight is that the initial draft generated purely from the LLM's training data represents the “noisy” starting state. Each iteration of identifying gaps, searching for information, and incorporating findings acts as a denoising step that brings the report closer to ground truth.

The process terminates when (in priority order):

Gap-closed: Diverse queries yield no new findings.
Iteration cap: Hard stop at 15 supervisor iterations.
Supervisor override: Allowed only with rationale tied to evidence coverage.

Guardrails: Require citations for new facts (drop uncited claims), retry failed tool calls once then mark as a gap, and deduplicate by URL before synthesis. Completion is about evidence coverage, not draft polish.

Draft Denoising

Noisy → clean report

Draft (iteration 1)

Bullets

Compare OpenAI, Anthropic, DeepMind safety pillars
Pull 3–5 primary sources (2023–2025)

Refined report

The report converges toward a comprehensive, insight-rich, and readable deliverable with clean citations that pass the FACT evaluation.

Iteration 1 / 157% denoised

Diffusion loop (core)

A walkthrough of how the supervisor and sub-agents iterate, including prompts, parallel fan-out, and final synthesis. The code is complementary to the free-text explanations, to skip or deep-dive into the details as you see fit.

Phase 1 & 2: Brief and initial draft generation

The entry point is AgentLoop.Research. This function orchestrates all four phases of the diffusion algorithm. The first two phases are straightforward LLM calls:

// ============================================================================
// go-research/internal/architectures/think_deep/loop.go
// ============================================================================
// AgentLoop.Research is the main entry point for the diffusion algorithm.
// It orchestrates all four phases: brief → draft → diffusion → final report.

func (o *AgentLoop) Research(ctx context.Context, query string) (*LoopResult, error) {
    startTime := time.Now()

    // ========================================================================
    // PHASE 1: BRIEF GENERATION
    // ========================================================================
    // Transform the raw user query into a structured research brief.
    // The brief defines:
    //   - The core research question
    //   - Key sub-questions to explore
    //   - Expected deliverables and scope
    //   - Success criteria for the research
    //
    // This is a single LLM call using TransformToResearchBriefPrompt().
    // The brief serves as the "north star" for all subsequent research.

    researchBrief, _, _ := o.generateResearchBrief(ctx, query)

    // ========================================================================
    // PHASE 2: INITIAL DRAFT GENERATION (the "noisy" starting state)
    // ========================================================================
    // Generate the first draft using ONLY the LLM's training data.
    // This is the "noisy" initial state in diffusion terminology.
    //
    // The draft is intentionally imperfect - it contains:
    //   - Outdated information (training data cutoff)
    //   - Gaps marked with "[NEEDS RESEARCH]" placeholders
    //   - Uncertain claims that need verification
    //   - Incomplete sections that need expansion
    //
    // This "noise" will be "denoised" through iterative research.

    initialDraft, _, _ := o.generateInitialDraft(ctx, researchBrief)

    // ... continues to Phase 3 and 4 below
}

Show prompt: Transform to research brief (Phase 1)

TransformToResearchBriefPrompt

You will be given a user query. Your job is to translate this query into a more detailed and concrete research question that will be used to guide the research.

The user query is:
<Query>
%s
</Query>

Today's date is %s.

You will return a single research question that will be used to guide the research.

Guidelines:
1. Maximize Specificity and Detail
- Include all known user preferences and explicitly list key attributes or dimensions to consider.
- It is important that all details from the user are included in the instructions.

2. Handle Unstated Dimensions Carefully
- When research quality requires considering additional dimensions that the user hasn't specified, acknowledge them as open considerations rather than assumed preferences.
- Example: Instead of assuming "budget-friendly options," say "consider all price ranges unless cost constraints are specified."
- Only mention dimensions that are genuinely necessary for comprehensive research in that domain.

3. Avoid Unwarranted Assumptions
- Never invent specific user preferences, constraints, or requirements that weren't stated.
- If the user hasn't provided a particular detail, explicitly note this lack of specification.
- Guide the researcher to treat unspecified aspects as flexible rather than making assumptions.

4. Distinguish Between Research Scope and User Preferences
- Research scope: What topics/dimensions should be investigated (can be broader than user's explicit mentions)
- User preferences: Specific constraints, requirements, or preferences (must only include what user stated)
- Example: "Research coffee quality factors (including bean sourcing, roasting methods, brewing techniques) for San Francisco coffee shops, with primary focus on taste as specified by the user."

5. Use the First Person
- Phrase the request from the perspective of the user.

6. Sources
- If specific sources should be prioritized, specify them in the research question.
- For product and travel research, prefer linking directly to official or primary websites (e.g., official brand sites, manufacturer pages, or reputable e-commerce platforms like Amazon for user reviews) rather than aggregator sites or SEO-heavy blogs.
- For academic or scientific queries, prefer linking directly to the original paper or official journal publication rather than survey papers or secondary summaries.
- For people, try linking directly to their LinkedIn profile, or their personal website if they have one.

Show prompt: Initial draft generation (Phase 2)

InitialDraftPrompt

Based on all the research in your knowledge base, create a comprehensive, well-structured answer to the overall research brief:
<Research Brief>
%s
</Research Brief>

Today's date is %s.

Please create a detailed answer to the overall research brief that:
1. Is well-organized with proper headings (# for title, ## for sections, ### for subsections)
2. Includes specific facts and insights from the research
3. References relevant sources using [Title](URL) format
4. Provides a balanced, thorough analysis. Be as comprehensive as possible, and include all information that is relevant to the overall research question. People are using you for deep research and will expect detailed, comprehensive answers.
5. Includes a "Sources" section at the end with all referenced links

You can structure your report in a number of different ways. Here are some examples:

To answer a question that asks you to compare two things, you might structure your report like this:
1/ intro
2/ overview of topic A
3/ overview of topic B
4/ comparison between A and B
5/ conclusion

To answer a question that asks you to return a list of things, you might only need a single section which is the entire list.
1/ list of things or table of things
Or, you could choose to make each item in the list a separate section in the report. When asked for lists, you don't need an introduction or conclusion.
1/ item 1
2/ item 2
3/ item 3

To answer a question that asks you to summarize a topic, give a report, or give an overview, you might structure your report like this:
1/ overview of topic
2/ concept 1
3/ concept 2
4/ concept 3
5/ conclusion

If you think you can answer the question with a single section, you can do that too!
1/ answer

REMEMBER: Section is a VERY fluid and loose concept. You can structure your report however you think is best, including in ways that are not listed above!
Make sure that your sections are cohesive, and make sense for the reader.

For each section of the report, do the following:
- Use simple, clear language
- Use ## for section title (Markdown format) for each section of the report
- Do NOT ever refer to yourself as the writer of the report. This should be a professional report without any self-referential language.
- Do not say what you are doing in the report. Just write the report without any commentary from yourself.
- Each section should be as long as necessary to deeply answer the question with the information you have gathered. It is expected that sections will be fairly long and verbose. You are writing a deep research report, and users will expect a thorough answer.
- Use bullet points to list out information when appropriate, but by default, write in paragraph form.

Format the report in clear markdown with proper structure and include source references where appropriate.

<Citation Rules>
- Assign each unique URL a single citation number in your text
- End with ### Sources that lists each source with corresponding numbers
- IMPORTANT: Number sources sequentially without gaps (1,2,3,4...) in the final list regardless of which sources you choose
- Each source should be a separate line item in a list, so that in markdown it is rendered as a list.
- Example format:
[1] Source Title: URL
[2] Source Title: URL
- Citations are extremely important. Make sure to include these, and pay a lot of attention to getting these right. Users will often use these citations to look into more information.
</Citation Rules>

Phase 3: Supervisor diffusion loop

This is the heart of the algorithm. The supervisor runs an iterative loop that:

Analyzes gaps in the current draft
Delegates research tasks to sub-agents (in parallel)
Incorporates findings back into the draft
Repeats until research is complete

// ============================================================================
// PHASE 3: THE DIFFUSION LOOP
// ============================================================================
// This is where iterative "denoising" happens. The supervisor coordinates
// multiple sub-researchers to fill gaps in the draft.
//
// Key insight: The loop continues until RESEARCH FINDINGS are complete,
// NOT until the draft looks polished. This prevents premature termination.

func (o *AgentLoop) Research(ctx context.Context, query string) (*LoopResult, error) {
    // ... Phase 1 & 2 above ...

    // ========================================================================
    // PHASE 3: SUPERVISOR COORDINATION (DIFFUSION LOOP)
    // ========================================================================
    // The supervisor.Coordinate() function runs the actual diffusion loop.
    // It takes:
    //   - researchBrief: the research objectives
    //   - initialDraft: the "noisy" starting state
    //   - o.executeSubResearch: a callback that spawns sub-researchers
    //
    // The callback pattern allows the supervisor to delegate research
    // without knowing the implementation details of sub-researchers.

    supervisorResult, _ := o.supervisor.Coordinate(
        ctx,
        researchBrief,
        initialDraft,
        o.executeSubResearch,  // <-- This callback spawns parallel sub-agents
    )

    // supervisorResult now contains:
    //   - Notes: compressed research findings from all sub-researchers
    //   - DraftReport: the iteratively refined draft
    //   - SubInsights: structured insights with source URLs
    //   - IterationsUsed: how many diffusion iterations ran
    //   - Cost: total token usage across all agents

    // ... continues to Phase 4 below
}

Show prompt: Diffusion algorithm (supervisor loop)

Diffusion Algorithm Prompt

<Diffusion Algorithm>
1. generate the next research questions to address gaps in the draft report
2. **conduct_research**: retrieve external information to provide concrete delta for denoising
3. **refine_draft**: remove "noise" (imprecision, incompleteness) from the draft report
4. **research_complete**: complete research only based on conduct_research tool's findings'
   completeness. it should not be based on the draft report. even if the draft report looks
   complete, you should continue doing the research until all the research findings are
   collected. You know the research findings are complete by running conduct_research tool
   to generate diverse research questions to see if you cannot find any new findings.
</Diffusion Algorithm>

Inside the supervisor: the actual diffusion iteration

Now let's look inside SupervisorAgent.Coordinate to see the actual loop. This is where tool calls are parsed, parallelism is handled, and the draft evolves:

// ============================================================================
// go-research/internal/agents/supervisor.go
// ============================================================================
// SupervisorAgent.Coordinate runs the diffusion loop. Each iteration:
//   1. Builds context (system prompt + current draft + accumulated notes)
//   2. Asks the LLM what to do next (returns tool calls)
//   3. Executes tool calls (research in parallel, refinement sequentially)
//   4. Checks if research is complete
//   5. If not complete, loop back to step 1

func (s *SupervisorAgent) Coordinate(
    ctx context.Context,
    researchBrief string,
    initialDraft string,
    subResearcher SubResearcherCallback,  // Callback to spawn sub-agents
) (*SupervisorResult, error) {

    // Initialize state - this tracks the evolving draft and accumulated notes
    state := runtime.NewSupervisorState(researchBrief)
    state.UpdateDraft(initialDraft)

    // Build the system prompt once (includes the diffusion algorithm instructions)
    date := time.Now().Format("2006-01-02")
    systemPrompt := runtime.LeadResearcherPrompt(date, s.maxConcurrent, s.maxIterations)

    // ========================================================================
    // THE DIFFUSION LOOP
    // ========================================================================
    // This loop is bounded by maxIterations (default: 15) but will exit early
    // when the LLM calls "research_complete" tool.

    for state.Iterations < s.maxIterations {
        state.IncrementIteration()

        // ====================================================================
        // STEP 1: BUILD CONTEXT FOR THIS ITERATION
        // ====================================================================
        // Each iteration sends the LLM:
        //   - System prompt with diffusion algorithm
        //   - Current research brief
        //   - Current draft (evolving with each iteration)
        //   - Count of accumulated research notes
        //   - Conversation history (prior tool calls and results)
        //
        // This is the KEY to diffusion: context grows with each iteration!

        messages := s.buildMessages(systemPrompt, state)

        // ====================================================================
        // STEP 2: ASK LLM WHAT TO DO NEXT
        // ====================================================================
        resp, _ := s.client.Chat(ctx, messages)
        content := resp.Choices[0].Message.Content

        // ====================================================================
        // STEP 3: PARSE TOOL CALLS FROM LLM RESPONSE
        // ====================================================================
        // The LLM responds with XML-style tool calls like:
        //   <tool name="conduct_research">{"research_topic": "..."}</tool>
        //   <tool name="refine_draft">{}</tool>
        //   <tool name="research_complete">{}</tool>

        toolCalls := runtime.ParseToolCalls(content)

        // Check for research completion FIRST
        if s.hasResearchComplete(toolCalls) {
            break  // Exit the loop - research is done!
        }

        // If no tool calls, the model decided to stop
        if len(toolCalls) == 0 {
            break
        }

        // ====================================================================
        // STEP 4: SPLIT TOOL CALLS BY TYPE
        // ====================================================================
        // This is where parallelism is set up:
        //   - conduct_research calls → run in PARALLEL (separate goroutines)
        //   - think/refine_draft calls → run SEQUENTIALLY

        var conductResearchCalls []runtime.ToolCallParsed
        var otherCalls []runtime.ToolCallParsed

        for _, tc := range toolCalls {
            if tc.Tool == "conduct_research" {
                conductResearchCalls = append(conductResearchCalls, tc)
            } else {
                otherCalls = append(otherCalls, tc)
            }
        }

        // Execute sequential tools first (think, refine_draft)
        var toolResults []string
        for _, tc := range otherCalls {
            result, _ := s.executeToolCall(ctx, tc, state, ...)
            toolResults = append(toolResults, result)
        }

        // ====================================================================
        // STEP 5: EXECUTE RESEARCH IN PARALLEL
        // ====================================================================
        // This is the parallel fan-out! Each conduct_research call spawns
        // a separate sub-agent in its own goroutine.

        if len(conductResearchCalls) > 0 {
            researchResults, err := s.executeParallelResearch(
                ctx,
                conductResearchCalls,
                state,
                subResearcher,  // <-- The callback that creates sub-agents
                &researcherNum,
                &totalCost,
            )
            toolResults = append(toolResults, researchResults...)
        }

        // ====================================================================
        // STEP 6: ADD RESULTS TO CONVERSATION HISTORY
        // ====================================================================
        // Tool results are added as a "user" message so the LLM sees them
        // in the next iteration. This is how context accumulates!

        state.AddMessage(llm.Message{
            Role:    "user",
            Content: strings.Join(toolResults, "\n\n---\n\n"),
        })

        // Loop continues to next iteration...
    }

    // Return the final state after all iterations
    return &SupervisorResult{
        Notes:          state.Notes,
        DraftReport:    state.DraftReport,
        IterationsUsed: state.Iterations,
        SubInsights:    state.GetSubInsights(),
        Cost:           totalCost,
    }, nil
}

Show prompt: Lead researcher (supervisor)

Lead Researcher Prompt

You are a research supervisor. Your job is to conduct research by calling the
"conduct_research" tool and refine the draft report by calling "refine_draft"
tool based on your new research findings.

<Diffusion Algorithm>
1. generate the next research questions to address gaps in the draft report
2. **conduct_research**: retrieve external information to provide concrete delta for denoising
3. **refine_draft**: remove "noise" (imprecision, incompleteness) from the draft report
4. **research_complete**: complete research only based on conduct_research tool's findings'
   completeness. it should not be based on the draft report.
</Diffusion Algorithm>

<Hard Limits>
- **Bias towards single agent** - Use single agent unless clear parallelization opportunity
- **Stop when you can answer confidently** - Don't keep delegating for perfection
- **Limit tool calls** - Always stop after {maxIterations} tool calls
</Hard Limits>

<Scaling Rules>
**Simple fact-finding**: Use 1 sub-agent
**Comparisons**: Use sub-agent per element (max 3 parallel)
**Data Analysis**: Delegate with clear file path AND analysis objective

**Important**: When calling conduct_research, provide complete standalone instructions -
sub-agents can't see other agents' work
</Scaling Rules>

Parallel research fan-out

When the supervisor receives multiple conduct_research calls in one response, they execute in parallel (maxConcurrent defaults to 3). If a batch is still running, avoid issuing a new fan-out to reduce thrash/backpressure.

Parallel Sub-Agents

Supervisor coordinates up to 3 research threads

Supervisor

Assigning distinct questions...

Sub-Agent 1

Topic: Topic A

Focus: Global or section-level query

Researching...

Findings returned

Sub-Agent 2

Topic: Topic B

Focus: Section-specific deep dive

Researching...

Findings returned

Sub-Agent 3

Topic: Topic C

Focus: Comparative or incident-focused

Researching...

Findings returned

refine_draft_report

Draft updated with citations

Continue loop orResearchComplete?

How it works:

1. Assign: Supervisor generates research questions and delegates to sub-agents (max 3 parallel)

2. Research: Sub-agents work independently with isolated contexts, return compressed findings

3. Refine: Findings converge, draft updated with citations, completeness assessed

// ============================================================================
// go-research/internal/agents/supervisor.go
// ============================================================================
// executeParallelResearch fans out multiple research tasks to goroutines.
// Each sub-researcher runs independently with its own:
//   - Tool registry (search, fetch, read_document, analyze_csv, think)
//   - Conversation context (isolated from other sub-agents)
//   - Iteration budget (max 5 search calls)

func (s *SupervisorAgent) executeParallelResearch(
    ctx context.Context,
    calls []runtime.ToolCallParsed,        // Multiple conduct_research calls
    state *runtime.SupervisorState,
    subResearcher SubResearcherCallback,   // Callback to spawn each agent
    researcherNum *int,
    totalCost *session.CostBreakdown,
) ([]string, error) {

    // Channel to collect results from all goroutines
    type researchResult struct {
        index     int
        resultStr string
        insights  []runtime.SubInsight
        cost      session.CostBreakdown
        err       error
    }
    resultsChan := make(chan researchResult, len(calls))

    // WaitGroup to track when all goroutines complete
    var wg sync.WaitGroup

    // ========================================================================
    // FAN OUT: Spawn one goroutine per conduct_research call
    // ========================================================================
    for idx, toolCall := range calls {
        wg.Add(1)

        go func(index int, tc runtime.ToolCallParsed, resNum int) {
            defer wg.Done()

            // Extract the research topic from the tool call
            topic, ok := tc.Args["research_topic"].(string)
            if !ok {
                resultsChan <- researchResult{
                    index: index,
                    err:   errors.New("missing research_topic"),
                }
                return
            }

            // ================================================================
            // THIS IS WHERE THE SUB-AGENT IS SPAWNED
            // ================================================================
            // subResearcher is a callback to AgentLoop.executeSubResearch()
            // which creates a new SubResearcherAgent with its own tools.

            result, err := subResearcher(ctx, topic, resNum, state.Iterations)

            resultsChan <- researchResult{
                index:     index,
                resultStr: result.CompressedResearch,  // Findings as text
                insights:  result.Insights,            // Structured insights
                cost:      result.Cost,
                err:       err,
            }
        }(idx, toolCall, *researcherNum)

        *researcherNum++  // Increment for next sub-agent ID
    }

    // ========================================================================
    // WAIT FOR ALL GOROUTINES TO COMPLETE
    // ========================================================================
    wg.Wait()
    close(resultsChan)

    // ========================================================================
    // AGGREGATE RESULTS
    // ========================================================================
    var toolResultStrings []string
    for res := range resultsChan {
        if res.err != nil {
            toolResultStrings = append(toolResultStrings,
                fmt.Sprintf("Research error: %v", res.err))
            continue
        }

        // Add findings to supervisor state
        state.AddNote(res.resultStr)
        state.AddSubInsights(res.insights)
        totalCost.Add(res.cost)

        toolResultStrings = append(toolResultStrings,
            fmt.Sprintf("Research findings:\n%s", res.resultStr))
    }

    return toolResultStrings, nil
}

What each sub-researcher actually does

Each sub-researcher is a complete agent with its own tool loop. It receives only the topic (no visibility into other agents' work) and runs its own search/analysis cycle:

// ============================================================================
// go-research/internal/architectures/think_deep/loop.go
// ============================================================================
// executeSubResearch is the callback passed to the supervisor.
// It creates a new sub-agent with isolated tools and context.

func (o *AgentLoop) executeSubResearch(
    ctx context.Context,
    topic string,              // The research question from supervisor
    researcherNum int,         // Unique ID for this sub-agent
    diffusionIteration int,    // Which supervisor iteration spawned this
) (*agents.SubResearcherResult, error) {

    // ========================================================================
    // BUILD DEDICATED TOOL REGISTRY
    // ========================================================================
    // Each sub-agent gets its own set of tools:
    //   - search: Web search via Brave API
    //   - fetch: Fetch and summarize a specific URL
    //   - read_document: Read PDF, DOCX, XLSX files
    //   - analyze_csv: Statistical analysis of CSV data
    //   - think: Internal reflection (preserved in conversation)

    subTools := runtime.SubResearcherToolRegistry(
        o.appConfig.BraveAPIKey,
        o.client,  // LLM client for summarization within tools
    )

    // ========================================================================
    // CREATE THE SUB-AGENT
    // ========================================================================
    // The agent has:
    //   - Its own conversation context (cannot see other agents)
    //   - Strict iteration limits (max 5 search calls)
    //   - Event bus for progress updates

    subResearcher := agents.NewSubResearcherAgent(
        o.client,
        subTools,
        o.bus,
        agents.DefaultSubResearcherConfig(),  // maxIterations: 5
    )

    // ========================================================================
    // RUN THE SUB-AGENT'S OWN LOOP
    // ========================================================================
    // The sub-agent runs its own tool-calling loop:
    //   1. Receive topic as initial message
    //   2. Call think to plan search strategy
    //   3. Call search/fetch/read_document to gather info
    //   4. Call think to assess what's missing
    //   5. Repeat until confident or budget exhausted
    //   6. Compress findings and return

    return subResearcher.Research(ctx, topic, researcherNum)
}

// ============================================================================
// go-research/internal/agents/sub_researcher.go
// ============================================================================
// SubResearcherAgent.Research runs the actual search/analysis loop.

func (a *SubResearcherAgent) Research(
    ctx context.Context,
    topic string,
    researcherNum int,
) (*SubResearcherResult, error) {

    // Build system prompt with search instructions and limits
    systemPrompt := runtime.ResearchAgentPrompt(time.Now().Format("2006-01-02"))

    messages := []llm.Message{
        {Role: "system", Content: systemPrompt},
        {Role: "user", Content: topic},  // The research question
    }

    // ========================================================================
    // THE SUB-AGENT'S TOOL LOOP
    // ========================================================================
    for iterations := 0; iterations < a.maxIterations; iterations++ {
        resp, _ := a.client.Chat(ctx, messages)
        content := resp.Choices[0].Message.Content

        // Parse and execute tool calls (search, fetch, think, etc.)
        toolCalls := runtime.ParseToolCalls(content)

        if len(toolCalls) == 0 {
            break  // Agent decided to stop searching
        }

        // Execute each tool and collect results
        var results []string
        for _, tc := range toolCalls {
            result, _ := a.tools.Execute(ctx, tc.Tool, tc.Args)
            results = append(results, result)

            // Track visited URLs for deduplication
            if tc.Tool == "fetch" {
                if url, ok := tc.Args["url"].(string); ok {
                    a.visitedURLs[url] = true
                }
            }
        }

        // Add results to conversation
        messages = append(messages, llm.Message{
            Role:    "user",
            Content: strings.Join(results, "\n---\n"),
        })
    }

    // ========================================================================
    // COMPRESS FINDINGS
    // ========================================================================
    // Before returning, compress the research into a structured format
    // while preserving ALL information verbatim.

    compressed, _ := a.compressResearch(ctx, messages)

    return &SubResearcherResult{
        CompressedResearch: compressed,
        RawNotes:          rawNotes,
        Insights:          extractedInsights,
        Cost:              totalCost,
    }, nil
}

Show prompt: Sub-researcher (tool loop)

Sub-Researcher Prompt

You are a research assistant conducting research on the user's input topic.

<Hard Limits>
**Tool Call Budgets** (Prevent excessive searching):
- **Simple queries**: Use 2-3 search tool calls maximum
- **Complex queries**: Use up to 5 search tool calls maximum
- **Always stop**: After 5 search tool calls if you cannot find the right sources

**Stop Immediately When**:
- You can answer the user's question comprehensively
- You have 3+ relevant examples/sources for the question
- Your last 2 searches returned similar information
</Hard Limits>

<Show Your Thinking>
After each search tool call, use think to analyze the results:
- What key information did I find?
- What's missing?
- Do I have enough to answer the question comprehensively?
- Should I search more or provide my answer?
</Show Your Thinking>

Show prompt: Research compression (sub-agent → supervisor)

Research Compression Prompt

You are a research assistant that has conducted research on a topic. Your job is now
to clean up the findings, but preserve all of the relevant statements and information.

<Tool Call Filtering>
**IMPORTANT**: Focus only on substantive research content:
- **Include**: All search results and findings from web searches
- **Exclude**: think tool calls and responses - these are internal agent reflections
- **Focus on**: Actual information gathered from external sources
</Tool Call Filtering>

<Guidelines>
1. Output findings should be fully comprehensive and include ALL information verbatim
2. Include inline citations for each source
3. Include a "Sources" section at the end with all sources
4. Make sure to include ALL sources - a later LLM will merge this with others

Critical: Any information even remotely relevant must be preserved verbatim
(don't rewrite, summarize, or paraphrase it).
</Guidelines>

Phase 4: Final report synthesis

After the diffusion loop completes, the final phase synthesizes everything into a polished report. This applies the Insightfulness + Helpfulness quality rules:

// ============================================================================
// PHASE 4: FINAL REPORT GENERATION
// ============================================================================
// After diffusion completes, generate the final polished report.
// This phase:
//   1. Deduplicates findings by URL (avoid citing same source twice)
//   2. Applies Insightfulness rules (granular breakdown, mapping tables)
//   3. Applies Helpfulness rules (proper citations, markdown formatting)

func (o *AgentLoop) Research(ctx context.Context, query string) (*LoopResult, error) {
    // ... Phase 1, 2, 3 above ...

    // ========================================================================
    // PHASE 4: FINAL REPORT
    // ========================================================================
    // Combine the refined draft with all accumulated research findings
    // to produce the final deliverable.

    finalReport, reportCost, err := o.generateFinalReport(
        ctx,
        researchBrief,
        supervisorResult,  // Contains draft + notes + insights
    )

    return &LoopResult{
        Query:         query,
        ResearchBrief: researchBrief,
        Notes:         supervisorResult.Notes,
        DraftReport:   supervisorResult.DraftReport,
        FinalReport:   finalReport,
        SubInsights:   supervisorResult.SubInsights,
        Cost:          totalCost,
        Duration:      time.Since(startTime),
    }, nil
}

// generateFinalReport applies quality rules and deduplication
func (o *AgentLoop) generateFinalReport(
    ctx context.Context,
    brief string,
    supervisor *agents.SupervisorResult,
) (string, session.CostBreakdown, error) {

    // ========================================================================
    // DEDUPLICATE FINDINGS BY URL
    // ========================================================================
    // Multiple sub-agents may have found the same sources.
    // Remove notes that contain only URLs we've already seen.

    deduplicatedNotes := o.deduplicateFindings(supervisor.Notes)
    findings := strings.Join(deduplicatedNotes, "\n\n---\n\n")

    // ========================================================================
    // APPLY QUALITY RULES VIA PROMPT
    // ========================================================================
    // FinalReportPrompt includes:
    //   - Insightfulness rules (granular breakdown, mapping tables)
    //   - Helpfulness rules (citations, markdown, structure)
    //   - All accumulated research findings
    //   - The refined draft from diffusion

    prompt := runtime.FinalReportPrompt(
        brief,
        findings,
        supervisor.DraftReport,
        time.Now().Format("2006-01-02"),
    )

    resp, _ := o.client.Chat(ctx, []llm.Message{
        {Role: "user", Content: prompt},
    })

    return resp.Choices[0].Message.Content, cost, nil
}

Show prompt: Final report (quality rules)

Final Report Prompt

<Insightfulness Rules>
- Granular breakdown - Does the response have granular breakdown of topics
  and their specific causes and specific impacts?
- Detailed mapping table - Does the response have a detailed table mapping
  causes and effects?
- Nuanced discussion - Does the response have detailed exploration and
  explicit discussion?
</Insightfulness Rules>

<Helpfulness Rules>
- Satisfying user intent - Does the response directly address the user's request?
- Ease of understanding - Is the response fluent, coherent, and logically structured?
- Accuracy - Are the facts, reasoning, and explanations correct?
- Appropriate language - Is the tone suitable and professional?
</Helpfulness Rules>

<Citation Rules>
- Assign each unique URL a single citation number in your text
- End with ### Sources that lists each source with corresponding numbers
- IMPORTANT: Number sources sequentially without gaps (1,2,3,4...)
- Citations are extremely important - users rely on these
</Citation Rules>

The key insight: Unlike traditional pipelines that generate content in one pass, diffusion builds the report iteratively. Each supervisor iteration sees the current draft and accumulated research, allowing the system to self-correct and fill gaps it discovers along the way. This is why the loop checks findings completeness, not draft polish.

Gap closing & context

The algorithm explicitly separates information gap closing from generation gap closing:

Self-Balancing

Information gap → Generation gap

Stage 1: Information Gap (what we collect)

Outputs

Top sources (3–5): OpenAI system card, Anthropic Constitutional AI, DeepMind eval blogs.
Extracted facts: eval gates, red-team cadence, 2023–2025 incident summaries.
Inline quotes + URLs; duplicates removed.

Goal: close evidence gaps with primary sources before any polish.

Stage 2: Generation Gap (what you read)

Outputs

Narrative: safety pillars per lab with inline citations; incidents + mitigations.
Table: Lab vs eval gates vs red-team cadence vs interpretability depth.
Clarity pass: removes repetition, smooth flow, instruction-following guaranteed.

Goal: readable, insightful synthesis once facts are locked.

Why separate the stages?

“There is a trade-off between the two gaps. We cannot optimize the generation gap too early when the system is still optimizing the information gap because the generation gap tends to bring more verbose and stylistic content that can distract from finding missing information.”
— Paichun Lin, ThinkDepth.ai

Stage 1 characteristics:

Focus on what information exists, not how to present it
Draft updates are functional, not polished
Prioritizes breadth of coverage
Uses global-context OR section-specific queries based on gap analysis

Stage 2 characteristics:

All information is available
Focus on presentation, coherence, and user satisfaction
Applies full Insightfulness + Helpfulness rules
Generates final deliverable with proper citations

Context engineering considerations

Long-horizon research tasks face several context challenges. The diffusion approach addresses each systematically:

Problem	Description	Diffusion Solution
Context Poisoning	Hallucinations enter context	Draft serves as verified state
Context Distraction	Too much context overwhelms focus	Parallel sub-agents with isolated contexts
Context Confusion	Superfluous context influences output	Structured finding format with compression
Context Clash	Parts of context disagree	Supervisor resolves conflicts during refinement

Draft as context anchor

The draft serves as a persistent, verified context that:

Evolves incrementally: Each refine_draft call is validated
Structures information: Prevents disorganized accumulation
Guides research: Makes gaps explicit
Maintains coherence: Narrative thread across iterations

Traditional RAG:              Diffusion Approach:

Query → Search → Response     Query → Brief → Draft → [Research → Refine] × N → Report

Context grows unboundedly     Draft stays ~constant size
No structure                  Structured by sections
Can contradict itself         Conflicts resolved each iteration

Multi-agent context isolation

Sub-researchers operate with isolated contexts—they cannot see each other's work. This prevents topic A's findings from biasing topic B's research, keeps context from growing unboundedly during parallel work, and avoids confusion from interleaved search results.

Benchmark performance (RACE + FACT)

RACE (report quality) and FACT (citation quality) are the primary DeepResearch Bench lenses: RACE judges coverage, insight, instruction-following, and readability; FACT scores citation accuracy and effective citations.

DeepResearch Bench is the comprehensive benchmark for evaluating Deep Research Agents. It consists of 100 PhD-level research tasks designed by domain experts across Science & Technology, Finance & Business, Software, and other fields.

RACE framework (report quality)

RACE evaluates report generation quality through four dimensions:

Comprehensiveness: Coverage breadth and depth (measures information gap closing)
Insight / Depth: Quality, originality, logic, and value of analysis
Instruction Following: Adherence to task requirements and constraints
Readability: Clarity of structure, fluency, ease of understanding (measures generation gap closing)

FACT framework (citation quality)

FACT evaluates information retrieval and grounding capabilities:

Automatically extract statement-URL pairs from the report
Deduplicate redundant pairs
Web scrape + LLM judgment to verify support
Calculate Citation Accuracy (% correctly supported) and Effective Citations (avg verified per task)

RACE Metrics

ThinkDepth.ai vs peers

ComprehensivenessScore

Tavily Research

52.84

ThinkDepth.ai

52.02

Gemini 2.5 Pro Deep Research

49.51

OpenAI Deep Research

46.46

Claude Research

45.34

InsightScore

Tavily Research

53.59

ThinkDepth.ai

53.88

Gemini 2.5 Pro Deep Research

49.45

OpenAI Deep Research

43.73

Claude Research

42.79

Instruction FollowingScore

Tavily Research

51.92

ThinkDepth.ai

52.04

Gemini 2.5 Pro Deep Research

50.12

OpenAI Deep Research

49.39

Claude Research

47.58

ReadabilityScore

Tavily Research

49.21

ThinkDepth.ai

50.12

Gemini 2.5 Pro Deep Research

50.00

OpenAI Deep Research

47.22

Claude Research

44.66

Source: DeepResearch Bench (Hugging Face). Tavily Research sits above this comparison but remains closed-source.

Why diffusion outperforms

Iterative refinement catches gaps → Higher Comprehensiveness. Each iteration identifies and fills missing information. Traditional single-pass cannot self-correct.
Parallel execution is efficient → Better Coverage. Up to 3 sub-researchers gather diverse perspectives simultaneously with isolated contexts.
Explicit completion criteria → Validated Comprehensiveness. Research ends based on findings comprehensiveness, not draft appearance.
Self-balancing adaptivity → Right-Sized Research. Simple topics: 2-3 iterations. Complex topics: 10+ iterations as needed.
Draft as context anchor → Higher Readability. Draft serves as persistent context across iterations, reducing the “lost in the middle” problem.
Quality rules in final generation → Higher Insight. Insightfulness Rules (granular breakdown, detailed tables, nuanced discussion) applied systematically.

Test it out?

I implemented a version of this in go in the blog repository, in: /go-research. It expects API keys and runs a REPL with multiple architectures, including /think_deep (diffusion), /storm, and /fast (just a simple ReAct agent). It is not finished software and can execute ai generated code in your environment... :) At your own risk!

Browse the code: Go implementation (think_deep)
CLI with multiple architectures: Go Research

# Env (required)
OPENROUTER_API_KEY=sk-or-...
BRAVE_API_KEY=sk-brave-...

# Optional
RESEARCH_VAULT=~/research-vault     # Obsidian-compatible vault path
RESEARCH_VERBOSE=true               # Verbose REPL logs

# Run (from repo root)
cd go-research
cp .env.example .env   # optional template; env vars still required
OPENROUTER_API_KEY=... BRAVE_API_KEY=... go run ./cmd/research

# In the REPL (architectures are commands):
#   /think_deep <query>   # diffusion/self-balancing
#   /storm <query>        # STORM multi-perspective
#   /fast <query>         # single-worker quick pass
#   /architectures        # list available
#   /help                 # all commands

Practical takeaways

Start with a draft. It reveals gaps faster than a blank page and provides structure for subsequent research.
Deduplicate by URL before synthesis. Keeps signal high and prevents the same source from being cited multiple times with different wordings.
Completion is about evidence coverage, not aesthetics. Run diverse queries and only stop when they yield no new facts.
Cap iterations and concurrency. 15 loops max, 3 agents max. Prevents thrash and keeps costs predictable.
Separate information gap from generation gap. Don't polish until the facts are locked—otherwise you're polishing hallucinations.
Isolate sub-agent contexts. Each sub-researcher should have complete, standalone instructions. They can't see other agents' work.
Compress findings, preserve everything. When returning to the supervisor, remove only obvious duplicates—never summarize or paraphrase.

References and further reading

Acknowledgment: Paichun Lin — seminal work on self-balancing agentic AI and text-time diffusion directly inspired this implementation. Well done!