Skip to content

Phase 05 · Deep-DAG composition

The Archivist uses two packaged deep-DAGs:

  • book-search-fanout — the full 4-source scout cluster (extract query, decide tools, 4 parallel scouts, rank, merge, record, gate, recall). Placed three times in the parent: on-topic-search, author-search, and similar-search.
  • compose-retry-loop — the compose / validate / retry / respond terminal. Placed once as compose-loop; every successful search branch converges on it.

The parent DAG references both deep-DAGs by name via .deepDAG(placementName, dagName, routes, options). Each placement has its own stateMapping.output that copies the deep-DAG's writes back into the named parent state fields.

Flow

Deep-DAG: the packaged fan-out cluster

ts
/**
 * BookSearchFanoutDAG — reusable query-extract + 4-source parallel scout cluster.
 *
 * Internal flow:
 *
 *   bsf-extract-query
 *     └─ success ──► bsf-decide-tools
 *   bsf-decide-tools
 *     └─ (tools | no-tools) ──► book-search-fan-out (parallel, combine: collect)
 *          ├─ bsf-ol       (OpenLibrary)
 *          ├─ bsf-gb       (Google Books)
 *          ├─ bsf-subject  (Subject search)
 *          └─ bsf-wiki     (Wikipedia enrichment)
 *     └─ bsf-rank-candidates
 *     └─ bsf-merge-candidates
 *          ├─ ranked ──► bsf-record-findings
 *          └─ empty  ──► bsf-no-results (collects error → deep-DAG exits error)
 *     └─ bsf-record-findings
 *     └─ bsf-has-citations-gate
 *          ├─ pass ──► bsf-recall-past-visits ──► END (success)
 *          └─ fail ──► bsf-no-results (collects error → deep-DAG exits error)
 *
 * Outputs:
 *   success — query extracted, candidates found, ranked, recorded, and recalled
 *   error   — no candidates after merge, or citations gate failed;
 *             signalled via collectError on childState so executeDeepDAG
 *             routes the parent to its 'error' branch
 *
 * Molecular import pattern:
 *   import { BookSearchFanoutDAG, registerBookSearchFanoutNodes } from './deepdags/BookSearchFanoutDAG.ts';
 *   registerBookSearchFanoutNodes(dispatcher);
 *   dispatcher.registerDAG(BookSearchFanoutDAG);
 *
 * The deep-DAG operates on the parent's state directly (no stateMapping
 * needed) — it reads `state.query` and writes `state.terms`, `state.toolPlan`,
 * `state.candidates`, `state.shortlist`, and `state.priorContext`, which are
 * the same fields every intent branch in the parent DAG expects.
 *
 * Three placements of this DAG replace three inlined fan-out clusters in
 * the parent `the-archivist` DAG. One definition, three usages:
 *   on-topic-search  — general web book search
 *   author-search    — author body-of-work search
 *   similar-search   — recommend-similar fan-out
 *
 * Reviews and describe branches are inlined in the parent because they use
 * distinct post-scout steps (rankByRating and pickBestMatch respectively).
 */

import type { ArchivistState }    from '../ArchivistState.ts';
import { decideTools }       from '../nodes/decideTools.ts';
import { extractQuery }      from '../nodes/extractQuery.ts';
import { hasCitationsGate }  from '../nodes/hasCitationsGate.ts';
import { mergeCandidates }   from '../nodes/mergeCandidates.ts';
import { rankCandidates }    from '../nodes/rankCandidates.ts';
import { recallPastVisits }  from '../nodes/recallPastVisits.ts';
import { recordFindings }    from '../nodes/recordFindings.ts';
import {
  openLibraryScout,
  googleBooksScout,
  subjectScout,
  wikipediaScout,
} from '../nodes/scouts.ts';
import type { ArchivistServices } from '../services.ts';

import type { NodeInterface, Dagonizer  } from '@noocodex/dagonizer';
import { DAGBuilder } from '@noocodex/dagonizer/builder';
import type { DAG } from '@noocodex/dagonizer/entities';

/**
 * Internal terminal node that collects a recoverable error and exits.
 *
 * Used when the fan-out cluster finds no usable candidates — either
 * because merge produced an empty shortlist, or because the citations
 * gate found nothing written in the state graph. Collecting the error
 * causes `executeDeepDAG` to route the parent placement to its `error`
 * branch so the parent can dispatch to its own empty-result handling.
 */
const bsfNoResults: NodeInterface<ArchivistState, 'no-results', ArchivistServices> = {
  'name':    'bsf-no-results',
  'outputs': ['no-results'],
  async execute(state, context) {
    context.services.logger.warn('book-search-fanout: no candidates found — routing error to parent');
    if (state.failureCause.trim().length === 0) {
      // No cause was accumulated by scouts — synthesise a generic one.
      state.failureCause = 'No candidates found after searching all available sources. ';
    }
    state.collectError({
      'code':        'NO_CANDIDATES',
      'message':     'book-search-fanout found no usable candidates after merge and gate',
      'operation':   'bsf-no-results',
      'recoverable': true,
      'timestamp':   new Date().toISOString(),
    });
    return { 'output': 'no-results' };
  },
};

/**
 * The `book-search-fanout` DAG — one packaged unit that any parent DAG
 * can reference via `.deepDAG('placement-name', 'book-search-fanout', routes)`.
 */
export const BookSearchFanoutDAG: DAG = new DAGBuilder('book-search-fanout', '1.0')

  // ── 1. extract-query ─────────────────────────────────────────────────────
  // LLM parses the raw visitor question into structured search terms.
  // Writes state.terms for the scouts and decide-tools to consume.
  .node('bsf-extract-query', extractQuery, {
    'success': 'bsf-decide-tools',
  })

  // ── 2. decide-tools ──────────────────────────────────────────────────────
  // LLM decides which external sources to invoke. Both outputs route into
  // the parallel fan-out — each scout gates internally on state.toolPlan.
  .node('bsf-decide-tools', decideTools, {
    'tools':    'book-search-fan-out',
    'no-tools': 'book-search-fan-out',
  })

  // ── 3. book-search-fan-out ───────────────────────────────────────────────
  // All four scouts run concurrently. combine:'collect' waits for all four
  // and merges their state mutations. Each scout writes to state.candidates.
  .parallel('book-search-fan-out', ['bsf-ol', 'bsf-gb', 'bsf-subject', 'bsf-wiki'], 'collect', {
    'success': 'bsf-rank-candidates',
    'error':   'bsf-rank-candidates',
  })
  .node('bsf-ol',      openLibraryScout, { 'success': null, 'empty': null })
  .node('bsf-gb',      googleBooksScout, { 'success': null, 'empty': null })
  .node('bsf-subject', subjectScout,     { 'success': null, 'empty': null })
  .node('bsf-wiki',    wikipediaScout,   { 'success': null, 'empty': null })

  // ── 4. rank-candidates ───────────────────────────────────────────────────
  // LLM-driven relevance scoring. Always routes 'ranked' — even an empty
  // set — so merge can soft-gate on zero candidates.
  .node('bsf-rank-candidates', rankCandidates, {
    'ranked': 'bsf-merge-candidates',
  })

  // ── 5. merge-candidates ──────────────────────────────────────────────────
  // Cross-source dedupe via CanonicalId, top-5. Routes 'empty' to
  // bsf-no-results which collects an error so executeDeepDAG routes the
  // parent to its 'error' branch.
  .node('bsf-merge-candidates', mergeCandidates, {
    'ranked': 'bsf-record-findings',
    'empty':  'bsf-no-results',
  })

  // ── 6. record-findings ───────────────────────────────────────────────────
  // Deterministic RDF write — same input always produces the same triples.
  .node('bsf-record-findings', recordFindings, {
    'recorded': 'bsf-has-citations-gate',
  })

  // ── 7. has-citations-gate ────────────────────────────────────────────────
  // SPARQL ASK over the per-run state graph. Symbolic fence for the LLM.
  // 'fail' routes to bsf-no-results so the parent receives 'error'.
  .node('bsf-has-citations-gate', hasCitationsGate, {
    'pass': 'bsf-recall-past-visits',
    'fail': 'bsf-no-results',
  })

  // ── 8. recall-past-visits ────────────────────────────────────────────────
  // Injects prior-session context (prior queries + shortlisted titles) into
  // state.priorContext. Terminal node — deep-DAG exits cleanly → 'success'.
  .node('bsf-recall-past-visits', recallPastVisits, {
    'recalled': null,
  })

  // ── 9. bsf-no-results ────────────────────────────────────────────────────
  // Internal error-signal node. Collects a recoverable error so
  // executeDeepDAG routes the parent placement to its 'error' branch.
  .node('bsf-no-results', bsfNoResults, {
    'no-results': null,
  })

  .build();

/**
 * Register all nodes used by `BookSearchFanoutDAG` onto a dispatcher.
 *
 * Call this before `dispatcher.registerDAG(BookSearchFanoutDAG)`. Accepts
 * any `Dagonizer`-compatible dispatcher to allow consumers to use their
 * own subclass while still pulling in the molecular node set.
 *
 * @example
 * ```ts
 * registerBookSearchFanoutNodes(dispatcher);
 * dispatcher.registerDAG(BookSearchFanoutDAG);
 * ```
 */
export function registerBookSearchFanoutNodes(
  dispatcher: Dagonizer<ArchivistState, ArchivistServices>,
): void {
  for (const node of [
    extractQuery,
    decideTools,
    openLibraryScout,
    googleBooksScout,
    subjectScout,
    wikipediaScout,
    rankCandidates,
    mergeCandidates,
    recordFindings,
    hasCitationsGate,
    recallPastVisits,
    bsfNoResults,
  ]) {
    dispatcher.registerNode(node);
  }
}

Parent DAG: the deep-DAG placements

The #deepdag-placements region covers only the .deepDAG(...) calls — the three placements of book-search-fanout and the one placement of compose-retry-loop:

ts
// ── on-topic branch ──────────────────────────────────────────────────────
// Deep-DAG placement: book-search-fanout handles extract-query, decide-tools,
// all four scouts, rank-candidates, merge, record, gate, and recall.
// One packaged cluster — first of three placements of the same deep-DAG.
// stateMapping.output copies the fields the deep-DAG writes back to the
// parent state so compose-loop and group-by-year can read them.
.deepDAG('on-topic-search', 'book-search-fanout', {
  'success': 'compose-loop',
  'error':   'compose-empty',
}, {
  'stateMapping': {
    'output': {
      'terms':         'terms',
      'toolPlan':      'toolPlan',
      'candidates':    'candidates',
      'shortlist':     'shortlist',
      'priorContext':  'priorContext',
      'failureCause':  'failureCause',
    },
  },
})

// ── lookup-author branch ─────────────────────────────────────────────────
// Deep-DAG placement: same book-search-fanout cluster, second placement.
// After success, group-by-year sorts results chronologically before the
// compose loop — author surveys read better in publication-timeline order.
.deepDAG('author-search', 'book-search-fanout', {
  'success': 'group-by-year',
  'error':   'compose-empty',
}, {
  'stateMapping': {
    'output': {
      'terms':         'terms',
      'toolPlan':      'toolPlan',
      'candidates':    'candidates',
      'shortlist':     'shortlist',
      'priorContext':  'priorContext',
      'failureCause':  'failureCause',
    },
  },
})
// group-by-year is author-branch-specific: sorts shortlist chronologically.
.node('group-by-year', groupByYear, {
  'ordered': 'compose-loop',
})

// ── find-reviews branch ───────────────────────────────────────────────────
// Inlined — uses rankByRating (deterministic, rating-weighted) in place of
// rankCandidates (LLM-driven). The Google Books scout carries notes.rating /
// notes.ratingsCount; rankByRating weights those for reviews-style output.
.node('reviews-extract', extractQuery, {
  'success': 'reviews-decide-tools',
})
.node('reviews-decide-tools', decideTools, {
  'tools':    'reviews-fan-out',
  'no-tools': 'reviews-fan-out',
})
.parallel('reviews-fan-out', ['reviews-ol', 'reviews-gb', 'reviews-subject', 'reviews-wiki'], 'collect', {
  'success': 'reviews-rank',
  'error':   'reviews-rank',
})
.node('reviews-ol',      openLibraryScout, { 'success': null, 'empty': null })
.node('reviews-gb',      googleBooksScout, { 'success': null, 'empty': null })
.node('reviews-subject', subjectScout,     { 'success': null, 'empty': null })
.node('reviews-wiki',    wikipediaScout,   { 'success': null, 'empty': null })
.node('reviews-rank',    rankByRating,     { 'ranked': 'reviews-merge' })
.node('reviews-merge',   mergeCandidates,  { 'ranked': 'reviews-record', 'empty': 'compose-empty' })
.node('reviews-record',  recordFindings,   { 'recorded': 'reviews-gate' })
.node('reviews-gate',    hasCitationsGate, { 'pass': 'reviews-recall', 'fail': 'compose-empty' })
.node('reviews-recall',  recallPastVisits, { 'recalled': 'compose-loop' })

// ── describe-book branch ─────────────────────────────────────────────────
// Inlined — uses pickBestMatch to narrow multi-hit results to the top-3
// title-similar candidates before merge. Ensures the composer receives the
// specific book the visitor named, not arbitrary top-5 hits.
.node('describe-extract',      extractQuery,     { 'success': 'describe-decide-tools' })
.node('describe-decide-tools', decideTools,      { 'tools': 'describe-fan-out', 'no-tools': 'describe-fan-out' })
.parallel('describe-fan-out', ['describe-ol', 'describe-gb', 'describe-subject', 'describe-wiki'], 'collect', {
  'success': 'describe-pick',
  'error':   'compose-empty',
})
.node('describe-ol',      openLibraryScout, { 'success': null, 'empty': null })
.node('describe-gb',      googleBooksScout, { 'success': null, 'empty': null })
.node('describe-subject', subjectScout,     { 'success': null, 'empty': null })
.node('describe-wiki',    wikipediaScout,   { 'success': null, 'empty': null })
.node('describe-pick',   pickBestMatch,    { 'picked': 'describe-merge' })
.node('describe-merge',  mergeCandidates,  { 'ranked': 'describe-record', 'empty': 'compose-empty' })
.node('describe-record', recordFindings,   { 'recorded': 'describe-gate' })
.node('describe-gate',   hasCitationsGate, { 'pass': 'describe-recall', 'fail': 'compose-empty' })
.node('describe-recall', recallPastVisits, { 'recalled': 'compose-loop' })

// ── recommend-similar branch ─────────────────────────────────────────────
// recommendSimilar seeds state.terms from prior-run shortlist memory.
// 'seeded' routes to the book-search-fanout deep-DAG — third placement of
// the same packaged cluster. 'empty' routes to the decline terminal.
.node('recommend-similar', recommendSimilar, {
  'seeded': 'similar-search',
  'empty':  'compose-empty',
})

// Deep-DAG placement: same book-search-fanout, third and final placement.
.deepDAG('similar-search', 'book-search-fanout', {
  'success': 'compose-loop',
  'error':   'compose-empty',
}, {
  'stateMapping': {
    'output': {
      'terms':         'terms',
      'toolPlan':      'toolPlan',
      'candidates':    'candidates',
      'shortlist':     'shortlist',
      'priorContext':  'priorContext',
      'failureCause':  'failureCause',
    },
  },
})

// ── compose-loop — shared compose/validate deep-DAG ─────────────────────
// All branches that successfully find candidates converge here.
// composeResponse → validateResponse (retry loop, bounded by state.attempts.compose).
// One deep-DAG definition serves all four convergent branches.
// stateMapping.output copies the compose loop's writes back to the parent.
//
// Fan-in policy: 'success' routes to the shared respond-to-visitor terminal
// at the parent level — the deep-DAG produces state.draft and exits cleanly;
// exactly ONE respond-to-visitor fires per run regardless of branch count.
// 'error' (retry budget exhausted) falls through to compose-empty so the
// visitor always receives an in-character response rather than a silent drop.
.deepDAG('compose-loop', 'compose-retry-loop', {
  'success': 'respond-to-visitor',
  'error':   'compose-empty',
}, {
  'stateMapping': {
    'output': {
      'draft':    'draft',
      'approved': 'approved',
      'attempts': 'attempts',
    },
  },
})

What it demonstrates

  • .deepDAG(name, dagName, routes, options) — the placement references the deep-DAG by its registered name. The parent and child run in the same dispatcher; the child shares the same node registry.
  • stateMapping.output — after the deep-DAG completes, the dispatcher copies the listed fields from the child's final state back into the parent state. Fields not listed stay isolated.
  • One definition, three placementsbook-search-fanout is registered once and placed three times with distinct placement names. Each placement routes its 'success' / 'error' outputs differently (compose-loop, group-by-year, or decline-empty).
  • Errors bubble up — anything the child collects via state.collectError reaches the parent's error accumulator automatically. The executeDeepDAG router uses child-state errors to decide the 'error' output.
  • registerBookSearchFanoutNodes / registerComposeRetryLoopNodes — each deep-DAG module exports a helper that registers exactly the nodes it needs. Call both before registering the parent DAG.

See this in action in the Archivist live demo.

Composing the same flow via DAGDeriver.subDAGs

The DAGBuilder .deepDAG(...) path above is the deterministic authoring journey. The same DeepDAGNode placement can be produced declaratively via the DAGDeriver subDAGs annotation when the surrounding flow is agent-style (operations declare dependencies; topology emerges):

ts
DAGDeriver.derive({
  name: 'parent',
  version: '1',
  entrypoint: 'prepare',
  contracts: [
    { name: 'prepare',       hardRequired: ['input'],         produces: ['intermediate'], outputs: ['success'] },
    { name: 'invoke-plugin', hardRequired: ['intermediate'],  produces: ['childResult'],  outputs: ['success', 'error'] },
    { name: 'finalize',      hardRequired: ['childResult'],   produces: ['final'],        outputs: ['success'] },
  ],
  annotations: {
    subDAGs: {
      'invoke-plugin': {
        dag:     'plugin:transform',
        outputs: ['success', 'error'],
        stateMapping: {
          input:  { intermediate: 'intermediate' },
          output: { childResult:  'childResult' },
        },
      },
    },
  },
});
  • The contract's produces ↔ hardRequired still drives topology; the subDAGs annotation swaps the rendered placement from SingleNode to DeepDAGNode.
  • Every port in subDAG.outputs auto-wires to the next derived stage. terminals overrides individual ports if the error path needs a different target.
  • Sub-DAG references resolve at registerDAG time; the dispatcher's existing cycle check rejects self-referential subDAGs.
  • A runnable demonstration ships in examples/derive.ts (npm run example:derive).

See Authoring DAGs for the decision matrix between the imperative .deepDAG() path and the declarative subDAGs annotation.

Watched over by the Order of Dagon.