The Archivist
The Archivist is the running demo every Dagonizer example refers to. It is a bookstore help-bot — a visitor describes a book or asks for a recommendation, and the Archivist composes a response by classifying the question, fanning out across the shop's local catalog and an external RAG provider, merging the candidates, and composing + validating a draft response in a bounded retry loop.
Try it live below — the demo runs in your browser. The runner detects the best LLM backend available (Chrome's built-in Gemini Nano, your Gemini AI Studio key, or the offline stub) and surfaces which one is answering.
Conversation
Ask the Archivist something to begin.
Backend
Checkpoints
Timeouts
per-phase budgets — applied to the next runTrace
0 eventsNo events yet. Start a run to see the trace.
Watch the DAG pane: each node lights cyan while executing, then settles to "completed" with the taken edge highlighted. The Memory pane mirrors state.intent, state.terms, state.shortlist, state.attempts.compose as the dispatcher mutates them. Everything is driven by the dispatcher's onFlowStart / onNodeStart / onNodeEnd / onError / onFlowEnd hooks — there is no timer-based animation, the runner is a pure observer of the state machine.
Branches and gates
Three exit conditions, each carrying a different outcome.
| Path | Trigger | Terminal node | What happens |
|---|---|---|---|
| Off-topic hard gate | classifyIntent returns off-topic | decline-off-topic | Politely redirects the visitor to a book-related question. |
| Empty soft gate | mergeCandidates produces zero candidates | decline-empty | Asks the visitor for more detail; collects a EMPTY_SHORTLIST warning. |
| Best-effort response | validateResponse exhausts MAX_COMPOSE_ATTEMPTS | respond-to-visitor | Sends the last draft anyway — the dispatcher never throws. |
| Approved response | validateResponse returns approved | respond-to-visitor | Normal happy path. |
| Retry loop | validateResponse returns retry | back to compose-response | Bounded by the counter on state.attempts.compose. |
Backends
The Archivist runs against a real model in any of these environments — detectBackends() probes each and pickBestBackend() selects the highest-priority runnable backend. On mobile devices, Gemini Nano and WebLLM are excluded from auto-selection (both require desktop Chrome or a WebGPU-capable device). Cloud backends work on every device.
| Priority | Backend | What it needs |
|---|---|---|
| 1 | Groq (cloud, free tier) | Free key from console.groq.com/keys. Runs llama-3.3-70b-versatile. ~30 RPM on the free tier. Works on any device. |
| 2 | Cerebras (cloud, free tier) | Free key from cloud.cerebras.ai. Runs llama-3.3-70b on Wafer-Scale Engine. Works on any device. |
| 3 | Gemini API (Google AI Studio free tier) | Paste-into-form (browser). Free 15 RPM / 1500 RPD on gemini-2.0-flash. CORS open from any origin. Works on any device. |
| 4 | Mistral (cloud, free tier) | Free key from console.mistral.ai/api-keys/. Runs mistral-small-latest. Works on any device. |
| 5 | OpenRouter (cloud, free tier) | Free key from openrouter.ai/keys. Routes to llama-3.3-70b-instruct:free. Works on any device. |
| 6 | Gemini Nano (Chrome built-in, local) | Chrome 138+ stable, or any Chrome with the flags below. No key, no network, ~2 GB one-shot model download by Chrome. Desktop only. |
| 7 | WebLLM (in-browser, WebGPU) | Browser with navigator.gpu. Lazy-loads @mlc-ai/web-llm + Phi-3.5 mini (~780 MB) on first use; cached after. Desktop only. |
| 8 | Stub | Always available. Hand-coded canned answers. Always available on mobile as a zero-setup fallback; hidden from the desktop picker since on-device options exist. |
Seed library
On mount, 18 sci-fi and philosophy titles are pre-loaded into urn:dagonizer:memory so the Memory tab has content from first paint and stub responses cite real books from the visible graph. The seed covers:
- Science fiction — Liu Cixin, William Gibson, Ursula K. Le Guin (×2), Stanisław Lem, Ted Chiang, Jeff VanderMeer, Dan Simmons, Vernor Vinge, the Strugatsky brothers.
- Philosophy / philosophical literature — Borges, Wittgenstein, Camus, Foucault, Deleuze, Hofstadter, Marcus Aurelius, Hegel.
SeedLibrary.loadInto(memoryStore) clears urn:dagonizer:memory and reasserts all 18 books as RDF triples using the same dag:title, dag:author, dag:subject, dag:firstPublishYear, dag:summary, and rdf:type dag:Book predicates that StateProjection uses for run candidates. Because the vocabulary is shared, the MemoryGraph renders seed books and run candidates uniformly.
The seed is not stub-specific. Real LLM backends receive the pre-seeded triples through the recall-memories node's SPARQL digest — the library is a shared starting point for every backend. reset() restores the seed alongside the TBox ontology so a manual reset never leaves the Memory tab empty.
Mobile detection
MobileDetection.isLikelyMobile() triangulates three signals — touch points (navigator.maxTouchPoints > 1), coarse pointer media query ((pointer: coarse)), and narrow viewport (innerWidth < 900). All three must indicate mobile; a single signal is not enough. A "Treat as desktop" link in the mobile banner lets tablet visitors opt out of mobile detection and stores the override in localStorage (dagonizer-device-override).
If no API key is set on a mobile device, the demo runs with canned stub responses so the DAG still executes. The mobile banner makes the canned-vs-real distinction explicit: it reads "running with canned responses (not real AI)" when stub is active, and "using cloud backend [name]" once a key is entered and a cloud backend takes over. Adding any cloud key causes pickBestBackend to re-rank and swap the active backend automatically.
Enable Gemini Nano + tool calling in Chrome
The Archivist asks the LLM to invoke tools (currently web_search_books, backed by openlibrary.org). Gemini API uses native functionDeclarations; Chrome's on-device Gemini Nano honours the same plan via the Prompt API's responseConstraint JSON-schema field, which arrived behind feature flags.
Open
chrome://flagsand enable each of:#prompt-api-for-gemini-nano→ Enabled#prompt-api-for-gemini-nano-multimodal-input→ Enabled (newer flag name in some channels)#optimization-guide-on-device-model→ EnabledBypassPerfRequirement
Restart Chrome.
Trigger the download. Visit any page that calls
LanguageModel.create()(this demo will, but you can also paste the snippet below into DevTools):jsawait LanguageModel.create();Chrome downloads ~2 GB. Status is visible at
chrome://components— look for Optimization Guide On Device Model. The widget on this page also surfacesavailability()as "downloading…" until ready.Reload this page — the backend banner should now read Gemini Nano (Chrome on-device).
If the model is still downloadable rather than available after the steps above, leave Chrome open for a few minutes — the download runs in the background and is gated by Chrome's network-condition heuristics.
Bring-your-own Gemini API key
When Gemini Nano is unavailable, the next-best option is the Google AI Studio free tier:
- Go to aistudio.google.com/apikey and click Create API key. The free tier covers 15 requests/min and 1500 requests/day on
gemini-2.0-flash— plenty for the demo. - Paste the key into the Bring your own Gemini API key drawer below the backend picker. It's stored in
localStorageonly; the request itself goes straight from your browser to Google. - The runner picks
gemini-apiautomatically once a key is present.
CORS is open on the Gemini REST endpoint, so this works from GitHub Pages or any other static host without a proxy.
Use the offline stub
If you just want to watch the DAG animate without an LLM, pick Canned responses (offline stub) from the backend dropdown. The stub adapter pattern-matches the visitor's query and emits a web_search_books tool call when it sees ISBN-like patterns or quoted titles — exercising the same tool-calling path the real models take, just without the GPU.
# CLI — picks the offline stub when no key is set, Gemini REST when GEMINI_API_KEY is present.
npx tsx examples/the-archivist/runArchivist.ts
# Force Gemini REST with your key:
GEMINI_API_KEY=AIza... npx tsx examples/the-archivist/runArchivist.tsWhat each phase example covers
The eight per-phase example pages each isolate one Dagonizer feature against this domain:
| Phase | Feature | Page |
|---|---|---|
| 01 | Linear intake + terminal routing | Phase 01 · Linear intake |
| 02 | DAGBuilder authoring | Phase 02 · DAGBuilder |
| 03 | Tool schema design (JSON Schema 2020-12 inputSchema) | Phase 03 · Tool schemas |
| 04 | Fan-out scout with partition fan-in | Phase 04 · Fan-out scout |
| 05 | Deep-DAG composition | Phase 05 · Deep-DAG composition |
| 06 | Abortable visitor request | Phase 06 · Cancellation |
| 07 | RetryPolicy against the LLM composer | Phase 07 · Retry |
| 08 | Checkpoint mid-draft and resume | Phase 08 · Checkpoint + resume |
Every page starts from the same ArchivistState + services + node set; only the DAG variation and the registered subset change.
Compositional deep-DAGs
The Archivist's DAG is composed of two reusable deep-DAGs that ship as independent components. Each is a DAG value any consumer can import, register, and reference as a .deepDAG(...) placement in their own DAG.
book-search-fanout— extract-query + decide-tools + 4-source parallel scout cluster (OpenLibrary, Google Books, Subject, Wikipedia) + rank-candidates + merge-candidates + record-findings + has-citations-gate + recall-past-visits. Used in three intent branches (on-topic-search,author-search,similar-search); one definition, three placements.compose-retry-loop— compose-response + validate-response (with bounded retry loop back to compose) + respond-to-visitor. Every successful search branch funnels through this one shared cluster.
The renderer expands both deep-DAGs inline in the diagram — compound-graph children render inside the placement box so the full topology is visible. No opaque boxes.
Reviews and describe branches are inlined in the parent DAG because they substitute rankByRating and pickBestMatch for rankCandidates respectively — the structural variation is explicit rather than hidden behind a deep-DAG parameter.
BookSearchFanoutDAG
/**
* BookSearchFanoutDAG — reusable query-extract + 4-source parallel scout cluster.
*
* Internal flow:
*
* bsf-extract-query
* └─ success ──► bsf-decide-tools
* bsf-decide-tools
* └─ (tools | no-tools) ──► book-search-fan-out (parallel, combine: collect)
* ├─ bsf-ol (OpenLibrary)
* ├─ bsf-gb (Google Books)
* ├─ bsf-subject (Subject search)
* └─ bsf-wiki (Wikipedia enrichment)
* └─ bsf-rank-candidates
* └─ bsf-merge-candidates
* ├─ ranked ──► bsf-record-findings
* └─ empty ──► bsf-no-results (collects error → deep-DAG exits error)
* └─ bsf-record-findings
* └─ bsf-has-citations-gate
* ├─ pass ──► bsf-recall-past-visits ──► END (success)
* └─ fail ──► bsf-no-results (collects error → deep-DAG exits error)
*
* Outputs:
* success — query extracted, candidates found, ranked, recorded, and recalled
* error — no candidates after merge, or citations gate failed;
* signalled via collectError on childState so executeDeepDAG
* routes the parent to its 'error' branch
*
* Molecular import pattern:
* import { BookSearchFanoutDAG, registerBookSearchFanoutNodes } from './deepdags/BookSearchFanoutDAG.ts';
* registerBookSearchFanoutNodes(dispatcher);
* dispatcher.registerDAG(BookSearchFanoutDAG);
*
* The deep-DAG operates on the parent's state directly (no stateMapping
* needed) — it reads `state.query` and writes `state.terms`, `state.toolPlan`,
* `state.candidates`, `state.shortlist`, and `state.priorContext`, which are
* the same fields every intent branch in the parent DAG expects.
*
* Three placements of this DAG replace three inlined fan-out clusters in
* the parent `the-archivist` DAG. One definition, three usages:
* on-topic-search — general web book search
* author-search — author body-of-work search
* similar-search — recommend-similar fan-out
*
* Reviews and describe branches are inlined in the parent because they use
* distinct post-scout steps (rankByRating and pickBestMatch respectively).
*/
import type { ArchivistState } from '../ArchivistState.ts';
import { decideTools } from '../nodes/decideTools.ts';
import { extractQuery } from '../nodes/extractQuery.ts';
import { hasCitationsGate } from '../nodes/hasCitationsGate.ts';
import { mergeCandidates } from '../nodes/mergeCandidates.ts';
import { rankCandidates } from '../nodes/rankCandidates.ts';
import { recallPastVisits } from '../nodes/recallPastVisits.ts';
import { recordFindings } from '../nodes/recordFindings.ts';
import {
openLibraryScout,
googleBooksScout,
subjectScout,
wikipediaScout,
} from '../nodes/scouts.ts';
import type { ArchivistServices } from '../services.ts';
import type { NodeInterface, Dagonizer } from '@noocodex/dagonizer';
import { DAGBuilder } from '@noocodex/dagonizer/builder';
import type { DAG } from '@noocodex/dagonizer/entities';
/**
* Internal terminal node that collects a recoverable error and exits.
*
* Used when the fan-out cluster finds no usable candidates — either
* because merge produced an empty shortlist, or because the citations
* gate found nothing written in the state graph. Collecting the error
* causes `executeDeepDAG` to route the parent placement to its `error`
* branch so the parent can dispatch to its own empty-result handling.
*/
const bsfNoResults: NodeInterface<ArchivistState, 'no-results', ArchivistServices> = {
'name': 'bsf-no-results',
'outputs': ['no-results'],
async execute(state, context) {
context.services.logger.warn('book-search-fanout: no candidates found — routing error to parent');
if (state.failureCause.trim().length === 0) {
// No cause was accumulated by scouts — synthesise a generic one.
state.failureCause = 'No candidates found after searching all available sources. ';
}
state.collectError({
'code': 'NO_CANDIDATES',
'message': 'book-search-fanout found no usable candidates after merge and gate',
'operation': 'bsf-no-results',
'recoverable': true,
'timestamp': new Date().toISOString(),
});
return { 'output': 'no-results' };
},
};
/**
* The `book-search-fanout` DAG — one packaged unit that any parent DAG
* can reference via `.deepDAG('placement-name', 'book-search-fanout', routes)`.
*/
export const BookSearchFanoutDAG: DAG = new DAGBuilder('book-search-fanout', '1.0')
// ── 1. extract-query ─────────────────────────────────────────────────────
// LLM parses the raw visitor question into structured search terms.
// Writes state.terms for the scouts and decide-tools to consume.
.node('bsf-extract-query', extractQuery, {
'success': 'bsf-decide-tools',
})
// ── 2. decide-tools ──────────────────────────────────────────────────────
// LLM decides which external sources to invoke. Both outputs route into
// the parallel fan-out — each scout gates internally on state.toolPlan.
.node('bsf-decide-tools', decideTools, {
'tools': 'book-search-fan-out',
'no-tools': 'book-search-fan-out',
})
// ── 3. book-search-fan-out ───────────────────────────────────────────────
// All four scouts run concurrently. combine:'collect' waits for all four
// and merges their state mutations. Each scout writes to state.candidates.
.parallel('book-search-fan-out', ['bsf-ol', 'bsf-gb', 'bsf-subject', 'bsf-wiki'], 'collect', {
'success': 'bsf-rank-candidates',
'error': 'bsf-rank-candidates',
})
.node('bsf-ol', openLibraryScout, { 'success': null, 'empty': null })
.node('bsf-gb', googleBooksScout, { 'success': null, 'empty': null })
.node('bsf-subject', subjectScout, { 'success': null, 'empty': null })
.node('bsf-wiki', wikipediaScout, { 'success': null, 'empty': null })
// ── 4. rank-candidates ───────────────────────────────────────────────────
// LLM-driven relevance scoring. Always routes 'ranked' — even an empty
// set — so merge can soft-gate on zero candidates.
.node('bsf-rank-candidates', rankCandidates, {
'ranked': 'bsf-merge-candidates',
})
// ── 5. merge-candidates ──────────────────────────────────────────────────
// Cross-source dedupe via CanonicalId, top-5. Routes 'empty' to
// bsf-no-results which collects an error so executeDeepDAG routes the
// parent to its 'error' branch.
.node('bsf-merge-candidates', mergeCandidates, {
'ranked': 'bsf-record-findings',
'empty': 'bsf-no-results',
})
// ── 6. record-findings ───────────────────────────────────────────────────
// Deterministic RDF write — same input always produces the same triples.
.node('bsf-record-findings', recordFindings, {
'recorded': 'bsf-has-citations-gate',
})
// ── 7. has-citations-gate ────────────────────────────────────────────────
// SPARQL ASK over the per-run state graph. Symbolic fence for the LLM.
// 'fail' routes to bsf-no-results so the parent receives 'error'.
.node('bsf-has-citations-gate', hasCitationsGate, {
'pass': 'bsf-recall-past-visits',
'fail': 'bsf-no-results',
})
// ── 8. recall-past-visits ────────────────────────────────────────────────
// Injects prior-session context (prior queries + shortlisted titles) into
// state.priorContext. Terminal node — deep-DAG exits cleanly → 'success'.
.node('bsf-recall-past-visits', recallPastVisits, {
'recalled': null,
})
// ── 9. bsf-no-results ────────────────────────────────────────────────────
// Internal error-signal node. Collects a recoverable error so
// executeDeepDAG routes the parent placement to its 'error' branch.
.node('bsf-no-results', bsfNoResults, {
'no-results': null,
})
.build();
/**
* Register all nodes used by `BookSearchFanoutDAG` onto a dispatcher.
*
* Call this before `dispatcher.registerDAG(BookSearchFanoutDAG)`. Accepts
* any `Dagonizer`-compatible dispatcher to allow consumers to use their
* own subclass while still pulling in the molecular node set.
*
* @example
* ```ts
* registerBookSearchFanoutNodes(dispatcher);
* dispatcher.registerDAG(BookSearchFanoutDAG);
* ```
*/
export function registerBookSearchFanoutNodes(
dispatcher: Dagonizer<ArchivistState, ArchivistServices>,
): void {
for (const node of [
extractQuery,
decideTools,
openLibraryScout,
googleBooksScout,
subjectScout,
wikipediaScout,
rankCandidates,
mergeCandidates,
recordFindings,
hasCitationsGate,
recallPastVisits,
bsfNoResults,
]) {
dispatcher.registerNode(node);
}
}ComposeRetryLoopDAG
/**
* ComposeRetryLoopDAG — reusable compose / validate / retry loop.
*
* Internal flow:
*
* crl-compose-response
* └─ drafted ──► crl-validate-response
* ├─ approved ──► END (success) ─► parent: respond-to-visitor
* ├─ retry ──► crl-compose-response (bounded by state.attempts.compose)
* └─ exhausted ──► END (success) ─► parent: respond-to-visitor
*
* Outputs:
* success — draft composed (approved or best-effort); parent routes to
* the shared respond-to-visitor terminal.
* error — child-state errors accumulated (propagated by executeDeepDAG)
*
* Fan-in policy: this deep-DAG does NOT contain respondToVisitor. It is a
* pure compose/validate unit that produces state.draft and exits. The
* single shared respond-to-visitor placement lives at the parent DAG level
* so that every converging branch strikes exactly one terminal node per run.
*
* Molecular import pattern:
* import { ComposeRetryLoopDAG, registerComposeRetryLoopNodes } from './deepdags/ComposeRetryLoopDAG.ts';
* registerComposeRetryLoopNodes(dispatcher);
* dispatcher.registerDAG(ComposeRetryLoopDAG);
*
* The deep-DAG operates on the parent's state directly (no stateMapping
* needed) — it reads `state.shortlist` / `state.intent` / `state.priorContext`
* and writes `state.draft` / `state.approved`, which the parent DAG already
* manages. Every intent branch funnels through this one composed loop rather
* than each branch owning its own compose→validate chain.
*/
import type { ArchivistState } from '../ArchivistState.ts';
import { composeResponse, validateResponse } from '../nodes/composeResponse.ts';
import type { ArchivistServices } from '../services.ts';
import type { Dagonizer } from '@noocodex/dagonizer';
import { DAGBuilder } from '@noocodex/dagonizer/builder';
import type { DAG } from '@noocodex/dagonizer/entities';
/**
* The `compose-retry-loop` DAG — one packaged compose/validate unit that every
* intent branch references via `.deepDAG('compose-loop', 'compose-retry-loop', routes)`.
*
* Exits with `success` when the draft is approved or attempts are exhausted.
* The parent DAG routes `compose-loop → success → respond-to-visitor` so
* exactly ONE respond-to-visitor fires per run regardless of how many branches
* converge into this deep-DAG.
*/
export const ComposeRetryLoopDAG: DAG = new DAGBuilder('compose-retry-loop', '1.1')
// ── 1. compose-response ──────────────────────────────────────────────────
// LLM call wrapped with RetryPolicy for transient failures. Writes
// state.draft. Intent-specific compose methods dispatched inside the node
// via state.intent switch.
.node('crl-compose-response', composeResponse, {
'drafted': 'crl-validate-response',
})
// ── 2. validate-response ─────────────────────────────────────────────────
// Quality gate: length, citations, tone. On 'retry', routes back to
// compose (bounded by MAX_COMPOSE_ATTEMPTS on state.attempts.compose).
// 'approved' and 'exhausted' both exit the deep-DAG cleanly (null terminal)
// so the parent receives output 'success' and routes to respond-to-visitor.
.node('crl-validate-response', validateResponse, {
'approved': null,
'retry': 'crl-compose-response',
'exhausted': null,
})
.build();
/**
* Register all nodes used by `ComposeRetryLoopDAG` onto a dispatcher.
*
* Call this before `dispatcher.registerDAG(ComposeRetryLoopDAG)`. Accepts
* any `Dagonizer`-compatible dispatcher to allow consumers to use their
* own subclass while still pulling in the molecular node set.
*
* @example
* ```ts
* registerComposeRetryLoopNodes(dispatcher);
* dispatcher.registerDAG(ComposeRetryLoopDAG);
* ```
*/
export function registerComposeRetryLoopNodes(
dispatcher: Dagonizer<ArchivistState, ArchivistServices>,
): void {
for (const node of [
composeResponse,
validateResponse,
]) {
dispatcher.registerNode(node);
}
}Source
JSON-LD as the canonical DAG format
The DAG is JSON-LD natively. DAGBuilder.build() returns a plain JavaScript object whose wire shape is JSON-LD 1.1 — every placement carries a typed IRI under @type. Dagonizer.serialize(dag) produces the JSON string; Dagonizer.load(json) parses and validates it back to an equivalent typed DAG.
There is no separate projection layer or dual configuration. The object DAGBuilder.build() returns is the same object the engine consumes and the same object that serializes to JSON-LD. Load a DAG from JSON, register it, execute it — one surface throughout.
import { Dagonizer } from '@noocodex/dagonizer';
// Serialize the Archivist DAG to JSON for persistence or transfer:
const json = Dagonizer.serialize(archivistDAG);
// Restore it in another process or reload:
const dag = Dagonizer.load(json);
dispatcher.registerDAG(dag);Deep-DAG placements in the JSON-LD output look like:
{
"@type": "DeepDAGNode",
"name": "on-topic-search",
"dag": "book-search-fanout",
"outputs": { "success": "compose-loop", "error": "decline-empty" }
}DAG topology
/**
* The Archivist — canonical DAG, built with DAGBuilder. Version 6.0.
*
* Molecular composition: the parent DAG is composed of two reusable
* deep-DAGs that ship as independent components and are imported as
* `.deepDAG(...)` placements. The deep-DAGs are registered separately
* and referenced by name — the parent DAG never knows their internals.
*
* recall-context
* └─ recalled ──► classify-intent
*
* classify-intent
* ├─ off-topic ──► decline-off-topic ──► END
* │
* ├─ on-topic ──► [book-search-fanout] (extract+decide+4scouts+rank+merge+record+gate+recall)
* │ ├─ success ──► [compose-retry-loop] (compose+validate+retry)
* │ └─ error ──► compose-empty ──┐
* │ │
* ├─ lookup-author ──► [book-search-fanout] │
* │ ├─ success ──► group-by-year ──► [compose-retry-loop]
* │ └─ error ──► compose-empty ──┐
* │ ▼
* ├─ find-reviews ──► reviews-extract ──► [compose-retry-loop] (success) ──► respond-to-visitor ──► END
* │ (inline: decide+4scouts+rankByRating+merge+record+gate+recall) ▲
* │ ▲
* ├─ describe-book ──► describe-extract ──► [compose-retry-loop]
* │ (inline: decide+4scouts+pickBestMatch+merge+record+gate+recall)
* │
* ├─ recall-memories ──► memory-recall ──► compose-memory-recall ──────────────────────────────┐
* │ ▼
* └─ recommend-similar ──► recommend-similar-gate respond-to-visitor ──► END
* ├─ seeded ──► [book-search-fanout] ▲
* │ ├─ success ──► [compose-retry-loop] (success) ──────┘
* │ └─ error ──► compose-empty ──────────────────────►┘
* └─ empty ──► compose-empty ───────────────────────────────────────►┘
*
* Fan-in policy (v6.0): all response-producing branches converge into ONE
* shared `respond-to-visitor` terminal at this (parent) level. The
* compose-retry-loop deep-DAG exits with `success` after producing state.draft
* and does NOT contain respondToVisitor internally. This ensures exactly one
* terminal node fires per run with the full converged state.draft.
*
* Deep-DAGs (molecular components):
* book-search-fanout — extract-query + decide-tools + 4-source parallel scouts
* (OpenLibrary, Google Books, Subject, Wikipedia) + rankCandidates
* + mergeCandidates + recordFindings + hasCitationsGate +
* recallPastVisits. Three placements in this DAG:
* on-topic-search, author-search, similar-search.
*
* compose-retry-loop — composeResponse + validateResponse (with bounded retry loop)
* + respondToVisitor. Four placements in this DAG:
* compose-loop (shared by all four convergent branches).
*
* Inlined branches (reviews, describe):
* Reviews uses `rankByRating` (deterministic, rating-weighted) instead of
* `rankCandidates` (LLM-driven). Describe uses `pickBestMatch` to narrow to the
* top-3 title-similar candidates before merge. Both are structurally identical to
* book-search-fanout except for the post-scout ranking step — keeping them inline
* makes the intentional distinction explicit rather than hiding it behind a
* deep-DAG parameter.
*
* Empty-result handling (v5.2):
* `decline-empty` (canned response) is replaced by `compose-empty` →
* `respond-to-visitor` throughout. `compose-empty` calls the LLM with
* `state.failureCause` (accumulated by scouts) to produce an in-character
* message that acknowledges what was searched and offers a concrete next step.
* `decline-empty` is kept as a registered node for checkpoint backward compat.
*
* Builder vs literal equivalence:
* DAGBuilder.node(placementName, nodeImpl, routes) emits the same
* { type: 'single', name, node: nodeImpl.name, outputs: routes }
* object that the hand-written literal used. build() returns a plain
* DAG — identical wire shape, same Dagonizer.load() call.
*/
import { classifyIntent } from './nodes/classifyIntent.ts';
import { composeMemoryResponse } from './nodes/composeMemoryResponse.ts';
import { decideTools } from './nodes/decideTools.ts';
import { extractQuery } from './nodes/extractQuery.ts';
import { groupByYear } from './nodes/groupByYear.ts';
import { hasCitationsGate } from './nodes/hasCitationsGate.ts';
import { mergeCandidates } from './nodes/mergeCandidates.ts';
import { pickBestMatch } from './nodes/pickBestMatch.ts';
import { rankByRating } from './nodes/rankByRating.ts';
import { recallContext } from './nodes/recallContext.ts';
import { recallMemories } from './nodes/recallMemories.ts';
import { recallPastVisits } from './nodes/recallPastVisits.ts';
import { recommendSimilar } from './nodes/recommendSimilar.ts';
import { recordFindings } from './nodes/recordFindings.ts';
import { declineOffTopic, declineEmpty, respondToVisitor, composeEmptyResponse } from './nodes/respondToVisitor.ts';
import { openLibraryScout, googleBooksScout, subjectScout, wikipediaScout } from './nodes/scouts.ts';
import { DAGBuilder } from '@noocodex/dagonizer/builder';
export const archivistDAG = new DAGBuilder('the-archivist', '6.0')
// ── 0. recall-context ────────────────────────────────────────────────────
// First added → auto-entrypoint. Runs before classifyIntent so the
// classifier can benefit from prior-session continuity hints.
.node('recall-context', recallContext, {
'recalled': 'classify-intent',
})
// ── 1. classify-intent ───────────────────────────────────────────────────
// Wide output union routes to six branches. Sub-DAG placements and inline
// branches share the same shared terminal: compose-loop and compose-empty.
// recall-memories routes directly to memory-recall → compose-memory-recall
// → memory-respond (no search fanout needed; the memory store is the source).
.node('classify-intent', classifyIntent, {
'lookup-author': 'author-search',
'find-reviews': 'reviews-extract',
'describe-book': 'describe-extract',
'recommend-similar': 'recommend-similar',
'recall-memories': 'memory-recall',
'on-topic': 'on-topic-search',
'off-topic': 'decline-off-topic',
})
// #region deepdag-placements
// ── on-topic branch ──────────────────────────────────────────────────────
// Deep-DAG placement: book-search-fanout handles extract-query, decide-tools,
// all four scouts, rank-candidates, merge, record, gate, and recall.
// One packaged cluster — first of three placements of the same deep-DAG.
// stateMapping.output copies the fields the deep-DAG writes back to the
// parent state so compose-loop and group-by-year can read them.
.deepDAG('on-topic-search', 'book-search-fanout', {
'success': 'compose-loop',
'error': 'compose-empty',
}, {
'stateMapping': {
'output': {
'terms': 'terms',
'toolPlan': 'toolPlan',
'candidates': 'candidates',
'shortlist': 'shortlist',
'priorContext': 'priorContext',
'failureCause': 'failureCause',
},
},
})
// ── lookup-author branch ─────────────────────────────────────────────────
// Deep-DAG placement: same book-search-fanout cluster, second placement.
// After success, group-by-year sorts results chronologically before the
// compose loop — author surveys read better in publication-timeline order.
.deepDAG('author-search', 'book-search-fanout', {
'success': 'group-by-year',
'error': 'compose-empty',
}, {
'stateMapping': {
'output': {
'terms': 'terms',
'toolPlan': 'toolPlan',
'candidates': 'candidates',
'shortlist': 'shortlist',
'priorContext': 'priorContext',
'failureCause': 'failureCause',
},
},
})
// group-by-year is author-branch-specific: sorts shortlist chronologically.
.node('group-by-year', groupByYear, {
'ordered': 'compose-loop',
})
// ── find-reviews branch ───────────────────────────────────────────────────
// Inlined — uses rankByRating (deterministic, rating-weighted) in place of
// rankCandidates (LLM-driven). The Google Books scout carries notes.rating /
// notes.ratingsCount; rankByRating weights those for reviews-style output.
.node('reviews-extract', extractQuery, {
'success': 'reviews-decide-tools',
})
.node('reviews-decide-tools', decideTools, {
'tools': 'reviews-fan-out',
'no-tools': 'reviews-fan-out',
})
.parallel('reviews-fan-out', ['reviews-ol', 'reviews-gb', 'reviews-subject', 'reviews-wiki'], 'collect', {
'success': 'reviews-rank',
'error': 'reviews-rank',
})
.node('reviews-ol', openLibraryScout, { 'success': null, 'empty': null })
.node('reviews-gb', googleBooksScout, { 'success': null, 'empty': null })
.node('reviews-subject', subjectScout, { 'success': null, 'empty': null })
.node('reviews-wiki', wikipediaScout, { 'success': null, 'empty': null })
.node('reviews-rank', rankByRating, { 'ranked': 'reviews-merge' })
.node('reviews-merge', mergeCandidates, { 'ranked': 'reviews-record', 'empty': 'compose-empty' })
.node('reviews-record', recordFindings, { 'recorded': 'reviews-gate' })
.node('reviews-gate', hasCitationsGate, { 'pass': 'reviews-recall', 'fail': 'compose-empty' })
.node('reviews-recall', recallPastVisits, { 'recalled': 'compose-loop' })
// ── describe-book branch ─────────────────────────────────────────────────
// Inlined — uses pickBestMatch to narrow multi-hit results to the top-3
// title-similar candidates before merge. Ensures the composer receives the
// specific book the visitor named, not arbitrary top-5 hits.
.node('describe-extract', extractQuery, { 'success': 'describe-decide-tools' })
.node('describe-decide-tools', decideTools, { 'tools': 'describe-fan-out', 'no-tools': 'describe-fan-out' })
.parallel('describe-fan-out', ['describe-ol', 'describe-gb', 'describe-subject', 'describe-wiki'], 'collect', {
'success': 'describe-pick',
'error': 'compose-empty',
})
.node('describe-ol', openLibraryScout, { 'success': null, 'empty': null })
.node('describe-gb', googleBooksScout, { 'success': null, 'empty': null })
.node('describe-subject', subjectScout, { 'success': null, 'empty': null })
.node('describe-wiki', wikipediaScout, { 'success': null, 'empty': null })
.node('describe-pick', pickBestMatch, { 'picked': 'describe-merge' })
.node('describe-merge', mergeCandidates, { 'ranked': 'describe-record', 'empty': 'compose-empty' })
.node('describe-record', recordFindings, { 'recorded': 'describe-gate' })
.node('describe-gate', hasCitationsGate, { 'pass': 'describe-recall', 'fail': 'compose-empty' })
.node('describe-recall', recallPastVisits, { 'recalled': 'compose-loop' })
// ── recommend-similar branch ─────────────────────────────────────────────
// recommendSimilar seeds state.terms from prior-run shortlist memory.
// 'seeded' routes to the book-search-fanout deep-DAG — third placement of
// the same packaged cluster. 'empty' routes to the decline terminal.
.node('recommend-similar', recommendSimilar, {
'seeded': 'similar-search',
'empty': 'compose-empty',
})
// Deep-DAG placement: same book-search-fanout, third and final placement.
.deepDAG('similar-search', 'book-search-fanout', {
'success': 'compose-loop',
'error': 'compose-empty',
}, {
'stateMapping': {
'output': {
'terms': 'terms',
'toolPlan': 'toolPlan',
'candidates': 'candidates',
'shortlist': 'shortlist',
'priorContext': 'priorContext',
'failureCause': 'failureCause',
},
},
})
// ── compose-loop — shared compose/validate deep-DAG ─────────────────────
// All branches that successfully find candidates converge here.
// composeResponse → validateResponse (retry loop, bounded by state.attempts.compose).
// One deep-DAG definition serves all four convergent branches.
// stateMapping.output copies the compose loop's writes back to the parent.
//
// Fan-in policy: 'success' routes to the shared respond-to-visitor terminal
// at the parent level — the deep-DAG produces state.draft and exits cleanly;
// exactly ONE respond-to-visitor fires per run regardless of branch count.
// 'error' (retry budget exhausted) falls through to compose-empty so the
// visitor always receives an in-character response rather than a silent drop.
.deepDAG('compose-loop', 'compose-retry-loop', {
'success': 'respond-to-visitor',
'error': 'compose-empty',
}, {
'stateMapping': {
'output': {
'draft': 'draft',
'approved': 'approved',
'attempts': 'attempts',
},
},
})
// #endregion deepdag-placements
// ── respond-to-visitor — single shared happy-path terminal ───────────────
// Every branch that successfully composes a response converges here.
// compose-loop (success) and both memory + empty-result paths all route
// through this one placement — fan-in policy: exactly ONE respond-to-visitor
// fires per run with the full converged state.draft in context.
.node('respond-to-visitor', respondToVisitor, { 'success': null })
// ── recall-memories branch ───────────────────────────────────────────────
// No search fanout needed — the memory store is queried directly.
// recallMemories → composeMemoryResponse → respond-to-visitor (shared terminal).
.node('memory-recall', recallMemories, { 'recalled': 'compose-memory-recall' })
.node('compose-memory-recall', composeMemoryResponse, { 'drafted': 'respond-to-visitor' })
// ── Terminal nodes ───────────────────────────────────────────────────────
.node('decline-off-topic', declineOffTopic, { 'success': null })
// decline-empty kept for checkpoint backward compatibility — new flows use
// compose-empty → respond-to-visitor for in-character failure responses.
.node('decline-empty', declineEmpty, { 'success': null })
.node('compose-empty', composeEmptyResponse, { 'drafted': 'respond-to-visitor' })
.build();State
/**
* ArchivistState — the clipboard the Archivist's nodes mutate.
*
* Carries the visitor's question, the parsed intent, scout candidates,
* the merged shortlist, the draft response, and per-execution counters.
* Extends `NodeStateBase` so the dispatcher owns the lifecycle FSM and
* `snapshot()` round-trips for `Checkpoint.from` / `Checkpoint.restore`.
*/
import type { Candidate } from './entities/Book.ts';
import { NodeStateBase } from '@noocodex/dagonizer';
import type { JsonObject } from '@noocodex/dagonizer/types';
/**
* A roll-up of everything the Archivist has accumulated in its memory
* store across all prior runs — produced by `recallMemories` and consumed
* by `composeMemoryResponse`.
*/
export interface MemoryDigest {
/** Total distinct books recorded across all runs. */
readonly bookCount: number;
/** Total visitor queries issued across all runs. */
readonly queryCount: number;
/** Up to the last 10 distinct shortlisted books (most-recent first). */
readonly recentBooks: ReadonlyArray<{ readonly title: string; readonly author?: string }>;
/** Intent distribution: how many times each intent was classified. */
readonly intentBreakdown: ReadonlyArray<{ readonly intent: string; readonly count: number }>;
/** 1–2 sentence LLM-ready summary of the digest. */
readonly summary: string;
}
/**
* Prior-context facts recalled from the memory graph before classification.
* `summary` is an LLM-ready 1–2 sentence hint; the structured arrays are
* available directly on `state.recalledContext` for downstream nodes.
*/
export interface RecalledContext {
/** Intents the classifier returned for similar prior queries. */
readonly priorIntents: ReadonlyArray<{
readonly query: string;
readonly intent: string;
readonly ts: string;
}>;
/** Books seen in recent state graphs (shortlisted candidates). */
readonly recentCandidates: ReadonlyArray<Candidate>;
/** Prior queries that overlap with the current query text. */
readonly similarPriorQueries: ReadonlyArray<{
readonly query: string;
readonly ts: string;
}>;
/** 1–2 sentence LLM-ready hint; empty string when nothing was recalled. */
readonly summary: string;
}
/** What the visitor asked the Archivist to do. */
export type ArchivistIntent =
| 'lookup-author' // visitor named an author and wants their body of work
| 'find-reviews' // visitor wants opinions / reviews / what readers think
| 'describe-book' // visitor named a specific title and wants a description
| 'recommend-similar' // visitor wants something like a previous read
| 'recall-memories' // visitor asked what the agent has seen / remembered
| 'search' // visitor named a title / author / ISBN (generic search)
| 'describe' // visitor described a book without naming it
| 'recommend' // visitor asked for a generic recommendation
| 'off-topic'; // visitor wandered — not a book query and not memory-related
export class ArchivistState extends NodeStateBase {
/** Raw question the visitor submitted. */
query = '';
/** Parsed intent — set by `classifyIntent`. */
intent: ArchivistIntent = 'search';
/** Structured query terms — set by `extractQuery`. */
terms: readonly string[] = [];
/** Candidates returned by each scout — partitioned by source. */
candidates: readonly Candidate[] = [];
/** Final shortlist after merge + dedupe + rank. */
shortlist: readonly Candidate[] = [];
/** The Archivist's draft response. */
draft = '';
/** Validation outcome. `null` if not yet validated. */
approved: boolean | null = null;
/** Compose retry counter — `RetryPolicy` reads `attempts.compose`. */
attempts: Record<string, number> = {};
/**
* Tool plan emitted by the LLM via `decideTools`. The DAG inspects
* this to gate the optional scouts (web search runs only when the
* LLM asked for it). Empty = no tools needed.
*/
toolPlan: ReadonlyArray<{ readonly name: string; readonly arguments: Record<string, unknown> }> = [];
/**
* Per-run identifier. Used to subject every triple we write so the
* recall node can `SELECT` other runs' facts without re-reading the
* current run's findings.
*/
runId: string = '';
/**
* Sanitized one-liner description of why the search produced no
* results. Accumulated by scouts and gate nodes; consumed by
* `composeEmptyResponse` to craft an in-character failure message.
* Empty string when no failure has been recorded.
*/
failureCause = '';
/**
* Prior-context facts the recall node SELECTs out of memory before
* compose. Each entry has a `kind` (e.g. 'prior-query',
* 'prior-recommendation') and free-text content the LLM can cite.
*/
priorContext: ReadonlyArray<{ readonly kind: string; readonly text: string }> = [];
/**
* Structured context recalled from the unified memory graph by
* `recallContext` (runs before `classifyIntent`). The `summary` field
* is injected into the classifier prompt; all fields are available to
* downstream nodes (decideTools, composeResponse).
*/
recalledContext: RecalledContext = {
'priorIntents': [],
'recentCandidates': [],
'similarPriorQueries': [],
'summary': '',
};
/**
* Memory roll-up produced by `recallMemories` for the `recall-memories`
* intent. Empty/zero-valued when the intent is not `recall-memories`.
*/
memoryDigest: MemoryDigest = {
'bookCount': 0,
'queryCount': 0,
'recentBooks': [],
'intentBreakdown': [],
'summary': '',
};
override clone(): ArchivistState {
const copy = new ArchivistState();
copy.query = this.query;
copy.intent = this.intent;
copy.terms = [...this.terms];
copy.candidates = [...this.candidates];
copy.shortlist = [...this.shortlist];
copy.draft = this.draft;
copy.approved = this.approved;
copy.attempts = { ...this.attempts };
copy.toolPlan = [...this.toolPlan];
copy.runId = this.runId;
copy.failureCause = this.failureCause;
copy.priorContext = [...this.priorContext];
copy.recalledContext = {
'priorIntents': [...this.recalledContext.priorIntents],
'recentCandidates': [...this.recalledContext.recentCandidates],
'similarPriorQueries': [...this.recalledContext.similarPriorQueries],
'summary': this.recalledContext.summary,
};
copy.memoryDigest = {
'bookCount': this.memoryDigest.bookCount,
'queryCount': this.memoryDigest.queryCount,
'recentBooks': [...this.memoryDigest.recentBooks],
'intentBreakdown': [...this.memoryDigest.intentBreakdown],
'summary': this.memoryDigest.summary,
};
return copy;
}
// #region snapshot-restore
protected override snapshotData(): JsonObject {
return {
"query": this.query,
"intent": this.intent,
"terms": [...this.terms],
"candidates": this.candidates.map((candidate) => ({
"book": { ...candidate.book, "authors": [...candidate.book.authors] },
"score": candidate.score,
"source": candidate.source,
})) as unknown as JsonObject[],
"shortlist": this.shortlist.map((candidate) => ({
"book": { ...candidate.book, "authors": [...candidate.book.authors] },
"score": candidate.score,
"source": candidate.source,
})) as unknown as JsonObject[],
"draft": this.draft,
"approved": this.approved,
"attempts": { ...this.attempts },
"failureCause": this.failureCause,
"recalledContext": {
"priorIntents": this.recalledContext.priorIntents as unknown as JsonObject[],
"recentCandidates": this.recalledContext.recentCandidates.map((c) => ({
"book": { ...c.book, "authors": [...c.book.authors] },
"score": c.score,
"source": c.source,
})) as unknown as JsonObject[],
"similarPriorQueries": this.recalledContext.similarPriorQueries as unknown as JsonObject[],
"summary": this.recalledContext.summary,
},
"memoryDigest": {
"bookCount": this.memoryDigest.bookCount,
"queryCount": this.memoryDigest.queryCount,
"recentBooks": this.memoryDigest.recentBooks as unknown as JsonObject[],
"intentBreakdown": this.memoryDigest.intentBreakdown as unknown as JsonObject[],
"summary": this.memoryDigest.summary,
},
};
}
protected override restoreData(snap: JsonObject): void {
if (typeof snap['query'] === 'string') this.query = snap['query'];
if (typeof snap['intent'] === 'string') this.intent = snap['intent'] as ArchivistIntent;
if (typeof snap['draft'] === 'string') this.draft = snap['draft'];
if (typeof snap['approved'] === 'boolean') this.approved = snap['approved'];
if (typeof snap['failureCause'] === 'string') this.failureCause = snap['failureCause'];
if (Array.isArray(snap['terms'])) this.terms = snap['terms'] as string[];
if (Array.isArray(snap['candidates'])) this.candidates = snap['candidates'] as unknown as Candidate[];
if (Array.isArray(snap['shortlist'])) this.shortlist = snap['shortlist'] as unknown as Candidate[];
if (snap['attempts'] && typeof snap['attempts'] === 'object') {
this.attempts = { ...snap['attempts'] as Record<string, number> };
}
const rc = snap['recalledContext'];
if (rc !== null && rc !== undefined && typeof rc === 'object' && !Array.isArray(rc)) {
const rcObj = rc as Record<string, unknown>;
this.recalledContext = {
'priorIntents': Array.isArray(rcObj['priorIntents']) ? rcObj['priorIntents'] as RecalledContext['priorIntents'] : [],
'recentCandidates': Array.isArray(rcObj['recentCandidates']) ? rcObj['recentCandidates'] as RecalledContext['recentCandidates'] : [],
'similarPriorQueries': Array.isArray(rcObj['similarPriorQueries']) ? rcObj['similarPriorQueries'] as RecalledContext['similarPriorQueries'] : [],
'summary': typeof rcObj['summary'] === 'string' ? rcObj['summary'] : '',
};
}
const md = snap['memoryDigest'];
if (md !== null && md !== undefined && typeof md === 'object' && !Array.isArray(md)) {
const mdObj = md as Record<string, unknown>;
this.memoryDigest = {
'bookCount': typeof mdObj['bookCount'] === 'number' ? mdObj['bookCount'] : 0,
'queryCount': typeof mdObj['queryCount'] === 'number' ? mdObj['queryCount'] : 0,
'recentBooks': Array.isArray(mdObj['recentBooks']) ? mdObj['recentBooks'] as MemoryDigest['recentBooks'] : [],
'intentBreakdown': Array.isArray(mdObj['intentBreakdown']) ? mdObj['intentBreakdown'] as MemoryDigest['intentBreakdown'] : [],
'summary': typeof mdObj['summary'] === 'string' ? mdObj['summary'] : '',
};
}
}
// #endregion snapshot-restore
}Prompts (composable directives)
/**
* prompts.ts — every prompt the Archivist sends, composed from small
* directive primitives.
*
* Directive = one short positive instruction (an "attractor")
* Prompt = a list of directives + slots, joined deterministically
* Schema = the data contract that pairs with a prompt
*
* Rules of the road:
* • Every prompt is built here. No other module assembles natural-language.
* • Directives state what to DO, not what to avoid (attractors beat repulsors).
* • Examples in schemas describe SHAPE, never real-world content,
* so models can't quote example data back into the conversation.
* • Persistent memory is INERT context; the directive only encourages
* citation when the visitor explicitly references their past.
*/
import type { MemoryDigest } from '../ArchivistState.ts';
import type { Candidate } from '../entities/Book.ts';
// ── Directive primitives ────────────────────────────────────────────────
/** Composable directive lines. Keep them positive, terse, and orthogonal. */
export const directives = {
"persona": 'You are the Archivist, a librarian at a small independent bookstore.',
"scope": 'Answer book-related questions: searches, descriptions, recommendations.',
"declineOffTopic": 'Decline off-topic questions politely and redirect to books.',
"beTerse": 'Reply in 2–3 sentences.',
"citeShortlist": 'Quote titles only from the shortlist supplied below.',
"groundInShortlist":'Ground every claim in the metadata of the supplied shortlist.',
"clarifyOnDoubt": 'If the shortlist is empty or the question is ambiguous, ask a single clarifying question.',
"memoryAsContext": 'Treat persistent memory as background only. Mention it when the visitor says "last time" / "earlier" / "I mentioned before".',
"emitJsonOnly": 'Return JSON that satisfies the supplied schema. No surrounding prose.',
"pickTerseQuery": 'Pick a terse search query: title, author, ISBN, or two-to-five topic keywords.',
"chronological": 'Present the works in publication order, oldest first.',
"weightRatings": 'Weight ratings (notes.rating + notes.ratingsCount) when scoring; high counts of high ratings boost score.',
"describeOnly": 'Describe the book in two sentences using the supplied metadata; do not recommend other titles.',
"authorSurvey": 'Treat the shortlist as one author\'s body of work; sketch its arc, not a single recommendation.',
"similarToPrior": 'Frame each suggestion as "similar to <prior title>" using the persistent-memory facts as the anchor.',
"weighOpinions": 'Quote average ratings and ratings counts when present; explain what readers seem to feel about each title.',
"continuityHint": 'Use the recent context if it suggests a likely intent or recurring interest.',
"recallMemories": 'When the visitor asks what you remember, what books you have seen, or what they have asked before, give a warm roll-up of your memory.',
"ownTheGap": 'Acknowledge which sources were searched. Explain in one sentence why nothing matched. Offer one concrete alternative angle the visitor could try.',
} as const;
// ── Shared system message — composed from persona directives ───────────
const SYSTEM = [
directives.persona,
directives.scope,
directives.declineOffTopic,
directives.beTerse,
directives.citeShortlist,
directives.groundInShortlist,
directives.clarifyOnDoubt,
directives.memoryAsContext,
].join(' ');
// ── Output schemas (the data contract — paired with prompts) ───────────
export const schemas = {
"rankCandidates": {
'type': 'object',
'description': 'Per-candidate ranking — score each ISBN against the visitor question.',
'additionalProperties': false,
'properties': {
'rankings': {
'type': 'array',
'description': 'One entry per candidate. Use the exact `isbn` shown in the input. Score in [0, 1].',
'items': {
'type': 'object',
'description': 'Required fields establish the contract; optional fields enrich it; additional key/value notes are welcome (vibe, themes, era, confidence).',
'additionalProperties': true,
'properties': {
'isbn': {
'type': 'string',
'description': 'Exact ISBN (or stable id) from the input candidate list.',
},
'score': {
'type': 'number',
'minimum': 0,
'maximum': 1,
'description': 'Relevance to the visitor question (1 = perfect, 0 = irrelevant).',
},
'reason': {
'type': 'string',
'description': 'One-sentence justification the Archivist may cite when composing.',
},
'confidence': {
'type': 'number',
'minimum': 0,
'maximum': 1,
'description': 'Confidence in the score itself; low when metadata is sparse.',
},
},
'required': ['isbn', 'score'],
},
},
},
'required': ['rankings'],
} as Record<string, unknown>,
};
// ── Prompt builders ────────────────────────────────────────────────────
/** Helpers expose only the builders; nodes never assemble prose themselves. */
export const prompts = {
classifyIntent(query: string, recalledSummary?: string): string {
const contextBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: [
'',
`Recent context: ${recalledSummary} ${directives.continuityHint}`,
].join('\n');
return [
SYSTEM,
'',
'Classify the visitor question as exactly one of the following intents:',
' lookup-author — the visitor named an author and wants their body of work',
' find-reviews — the visitor wants opinions, reviews, or what readers think',
' describe-book — the visitor named a specific title and wants a description',
' recommend-similar — the visitor wants something like a previous read',
' recall-memories — the visitor asks about your own memory or history: what books you have looked up, what they have asked before, what has been recommended; any meta-question about your past activity',
' search — the visitor named a topic / title / ISBN (no clear sub-case)',
' describe — the visitor described a book without naming it',
' recommend — the visitor asked for a generic recommendation',
' off-topic — the visitor asked something unrelated to books and unrelated to your memory',
'Prefer the most specific intent. Use recall-memories for any question about your activity, history, or memory. Respond with the single token only.',
contextBlock,
'',
`Visitor question: ${query}`,
].join('\n');
},
extractTerms(query: string): string {
return [
SYSTEM,
'',
'Extract 3–6 short search terms (1–3 words each) from the visitor question.',
'Return ONLY a JSON array of strings.',
'',
`Visitor question: ${query}`,
].join('\n');
},
decideTools(query: string): string {
// Tool descriptions / schemas flow through the adapter's native
// tools channel (Gemini's `functionDeclarations`, Nano's
// `responseConstraint`). The prompt itself stays lean.
return [
SYSTEM,
directives.pickTerseQuery,
'For any visitor question that names an author or describes a book to find, call ALL of the available tools — do not omit any source.',
'Use a short, keyword-only query (no surrounding quotes, no filler phrases).',
'',
`Visitor question: ${query}`,
].join('\n');
},
rankCandidates(query: string, candidates: readonly Candidate[]): string {
const rows = candidates.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
return [
SYSTEM,
directives.emitJsonOnly,
'',
`Visitor question: ${query}`,
'',
'Candidates:',
rows,
].join('\n');
},
compose(
query: string,
shortlist: readonly Candidate[],
priorContext?: readonly { kind: string; text: string }[],
recalledSummary?: string,
): string {
const rows = shortlist.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
const contextBlock = (priorContext === undefined || priorContext.length === 0)
? ''
: [
'',
'PERSISTENT MEMORY (background only — cite only on explicit recall request):',
...priorContext.map((p) => `- [${p.kind}] ${p.text}`),
].join('\n');
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
return [
SYSTEM,
directives.beTerse,
directives.citeShortlist,
'',
`Visitor question: ${query}`,
continuityBlock,
contextBlock,
'',
'Shortlist (ranked, top first):',
rows,
].join('\n');
},
composeAuthor(
query: string,
shortlist: readonly Candidate[],
priorContext?: readonly { kind: string; text: string }[],
recalledSummary?: string,
): string {
const rows = shortlist.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
const contextBlock = (priorContext === undefined || priorContext.length === 0)
? ''
: [
'',
'PERSISTENT MEMORY (background only — cite only on explicit recall request):',
...priorContext.map((p) => `- [${p.kind}] ${p.text}`),
].join('\n');
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
return [
SYSTEM,
directives.beTerse,
directives.citeShortlist,
directives.chronological,
directives.authorSurvey,
'',
`Visitor question: ${query}`,
continuityBlock,
contextBlock,
'',
'Shortlist (chronological, oldest first):',
rows,
].join('\n');
},
composeReviews(
query: string,
shortlist: readonly Candidate[],
priorContext?: readonly { kind: string; text: string }[],
recalledSummary?: string,
): string {
const rows = shortlist.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
const contextBlock = (priorContext === undefined || priorContext.length === 0)
? ''
: [
'',
'PERSISTENT MEMORY (background only — cite only on explicit recall request):',
...priorContext.map((p) => `- [${p.kind}] ${p.text}`),
].join('\n');
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
return [
SYSTEM,
directives.beTerse,
directives.citeShortlist,
directives.weightRatings,
directives.weighOpinions,
'',
`Visitor question: ${query}`,
continuityBlock,
contextBlock,
'',
'Shortlist (ranked by rating signal):',
rows,
].join('\n');
},
describeBook(
query: string,
shortlist: readonly Candidate[],
priorContext?: readonly { kind: string; text: string }[],
recalledSummary?: string,
): string {
const rows = shortlist.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
const contextBlock = (priorContext === undefined || priorContext.length === 0)
? ''
: [
'',
'PERSISTENT MEMORY (background only — cite only on explicit recall request):',
...priorContext.map((p) => `- [${p.kind}] ${p.text}`),
].join('\n');
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
return [
SYSTEM,
directives.describeOnly,
directives.citeShortlist,
directives.groundInShortlist,
'',
`Visitor question: ${query}`,
continuityBlock,
contextBlock,
'',
'Matched book(s):',
rows,
].join('\n');
},
composeSimilar(
query: string,
shortlist: readonly Candidate[],
priorContext?: readonly { kind: string; text: string }[],
recalledSummary?: string,
): string {
const rows = shortlist.map((c, i) => formatCandidateRow(i + 1, c)).join('\n');
const contextBlock = (priorContext === undefined || priorContext.length === 0)
? ''
: [
'',
'PERSISTENT MEMORY (anchor — cite explicitly as the basis for similarity):',
...priorContext.map((p) => `- [${p.kind}] ${p.text}`),
].join('\n');
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
return [
SYSTEM,
directives.beTerse,
directives.citeShortlist,
directives.similarToPrior,
'',
`Visitor question: ${query}`,
continuityBlock,
contextBlock,
'',
'Shortlist (ranked, top first):',
rows,
].join('\n');
},
composeEmptyResponse(query: string, failureCause: string): string {
const causeBlock = failureCause.trim().length > 0
? `\nSearch notes: ${failureCause.trim()}`
: '';
return [
SYSTEM,
directives.ownTheGap,
directives.beTerse,
'',
`Visitor question: ${query}`,
causeBlock,
].join('\n');
},
validate(draft: string, shortlist: readonly Candidate[]): string {
const titles = shortlist.map((c) => c.book.title).join(' | ');
return [
SYSTEM,
'Approve if the draft (a) mentions a shortlisted title and (b) reads as a polite on-topic reply.',
'Reply with the single token "yes" or "no".',
'',
`Shortlisted titles: ${titles}`,
'',
`Draft: ${draft}`,
].join('\n');
},
suggestStarterQuery(): string {
return [
directives.persona,
'The shop specialises in science fiction and philosophy.',
'Pick one acclaimed work or author from science fiction or philosophy at random — examples of the genre frame: Liu Cixin\'s Three Body Problem, William Gibson\'s Neuromancer, Ursula K. Le Guin, Stanisław Lem, Ted Chiang, Jorge Luis Borges, Albert Camus, Michel Foucault, Gilles Deleuze, Ludwig Wittgenstein. Pick something in that vein but vary your selection.',
'Phrase ONE short curious question a first-time visitor to a bookstore might ask about it.',
'The question must be under 20 words.',
'Return just the question — no preamble, no quotation marks, no explanation.',
].join(' ');
},
suggestGreeting(): string {
return [
directives.persona,
'The shop specialises in science fiction and philosophy.',
'Write ONE fresh opening greeting for a new visitor walking into the shop.',
'The greeting must be warm, curious, and invite a book question.',
'Keep it under 30 words.',
'Return just the greeting — no preamble, no quotation marks, no explanation.',
].join(' ');
},
suggestVisitorReplyTo(greeting: string): string {
return [
'A bookshop visitor has just received this greeting from the Archivist:',
`"${greeting}"`,
'The visitor is interested in science fiction and philosophy.',
'Write ONE natural first message the visitor might send in reply.',
'The reply must be a book question or request that follows naturally from the greeting.',
'Keep it under 30 words.',
'Return just the visitor message — no preamble, no quotation marks, no explanation.',
].join(' ');
},
explainTool(name: string, context: string): string {
return [
'You are a librarian explaining a backend tool to a curious visitor.',
`The tool is called "${name}".`,
`Here is what it does: ${context}`,
'Explain in 2-3 plain-English sentences:',
'1. What the tool does',
'2. Why it matters',
'3. One concrete example use-case',
'Keep it warm and clear. No jargon. Under 80 words.',
'Return just the explanation, no preamble.',
].join('\n');
},
composeMemoryRecall(
query: string,
digest: MemoryDigest,
recalledSummary?: string,
): string {
const continuityBlock = (recalledSummary === undefined || recalledSummary.length === 0)
? ''
: `\nConversation context: ${recalledSummary}`;
const digestBlock = digest.bookCount === 0
? 'Memory status: my shelves are fresh — no books have been recorded yet this session.'
: [
`Memory status: ${String(digest.bookCount)} distinct book${digest.bookCount === 1 ? '' : 's'} recorded, ${String(digest.queryCount)} visitor ${digest.queryCount === 1 ? 'query' : 'queries'} seen.`,
digest.recentBooks.length > 0
? `Recent titles: ${digest.recentBooks.map((b) => `"${b.title}"${b.author !== undefined ? ` by ${b.author}` : ''}`).join('; ')}.`
: '',
digest.intentBreakdown.length > 0
? `Intent breakdown: ${digest.intentBreakdown.map((e) => `${e.intent} (${String(e.count)})`).join(', ')}.`
: '',
].filter(Boolean).join(' ');
return [
SYSTEM,
directives.recallMemories,
directives.beTerse,
'',
`Visitor question: ${query}`,
continuityBlock,
'',
digestBlock,
].join('\n');
},
};
// ── Internals ──────────────────────────────────────────────────────────
function formatCandidateRow(n: number, c: Candidate): string {
const parts: string[] = [];
parts.push(`${String(n)}. isbn=${c.book.isbn}`);
parts.push(`"${c.book.title}"`);
parts.push(`by ${c.book.authors.join(', ') || '<unknown author>'}`);
if (c.book.firstPublishYear !== undefined) parts.push(`(${String(c.book.firstPublishYear)})`);
if (c.book.subjects !== undefined && c.book.subjects.length > 0) {
parts.push(`subjects: ${c.book.subjects.slice(0, 5).join(', ')}`);
}
if (c.book.publishers !== undefined && c.book.publishers.length > 0) {
parts.push(`pub: ${c.book.publishers[0]}`);
}
if (c.book.summary !== undefined && c.book.summary.length > 0) {
parts.push(`— ${c.book.summary}`);
}
if (c.reason !== undefined && c.reason.length > 0) {
parts.push(`[rank-reason: ${c.reason}]`);
}
return parts.join(' | ');
}Classification node
/**
* classifyIntent — entry node. Asks the LLM to classify the visitor's
* question, then routes one of seven on-topic branches plus the
* off-topic exit:
*
* lookup-author → `lookup-author-web-search` (chronological author survey)
* find-reviews → `find-reviews` (ratings tool branch)
* describe-book → `describe-web-search` (one-hit description branch)
* recommend-similar → `recommend-similar` (prior-shortlist seeding branch)
* search | describe | recommend → `extract-query` (legacy on-topic pipeline)
* off-topic → `decline-off-topic`
*
* Demonstrates: a wide narrowly-typed output union and dispatch into
* sub-DAG branches based on classifier output.
*/
import type { ArchivistState } from '../ArchivistState.ts';
import type { ArchivistServices } from '../services.ts';
import type { NodeInterface } from '@noocodex/dagonizer';
type IntentOutput =
| 'lookup-author'
| 'find-reviews'
| 'describe-book'
| 'recommend-similar'
| 'recall-memories'
| 'on-topic'
| 'off-topic';
export const classifyIntent: NodeInterface<ArchivistState, IntentOutput, ArchivistServices> = {
"name": 'classify-intent',
"outputs": ['lookup-author', 'find-reviews', 'describe-book', 'recommend-similar', 'recall-memories', 'on-topic', 'off-topic'],
async execute(state, context) {
const summary = state.recalledContext.summary.length > 0
? state.recalledContext.summary
: undefined;
const intent = await context.services.llm.classifyIntent(state.query, summary);
state.intent = intent;
context.services.logger.info(`intent=${intent}`);
switch (intent) {
case 'off-topic': return { "output": 'off-topic' };
case 'lookup-author': return { "output": 'lookup-author' };
case 'find-reviews': return { "output": 'find-reviews' };
case 'describe-book': return { "output": 'describe-book' };
case 'recommend-similar': return { "output": 'recommend-similar' };
case 'recall-memories': return { "output": 'recall-memories' };
default: return { "output": 'on-topic' };
}
},
};Memory + ontology
/**
* MemoryStore — browser-runnable RDF quad store for the Archivist.
*
* Wraps `n3.Store` (pure JS, ~30KB gzipped, identical surface on Node
* and in the browser) and exposes a named-graph-aware surface:
*
* assert(s, p, o, graph?) — write one quad
* ask({ s?, p?, o?, graph? }) — boolean existence check
* select({ s?, p?, o?, graph? }) — list bound rows (vars start with ?)
* triplesIn(graph) — iterate quads in one graph
* triples() — iterate every quad
*
* Four named graphs are reserved by convention:
*
* urn:dagonizer:ontology — TBox schema (classes, properties, domains, ranges)
* loaded once on mount via loadOntology()
* urn:dagonizer:memory — persistent cross-run facts
* (books, sources, scores — survives reloads)
* urn:dagonizer:state:<runId> — per-run typed-state mirror
* (ArchivistState fields → triples on every node end)
* urn:dagonizer:prov:<runId> — PROV-O activity log
* (which node did what when, attributed to which agent)
*
* Pattern surface intentionally mirrors SPARQL's basic graph pattern
* (`{ ?s <pred> ?o }`) without a full SPARQL engine. For richer query
* shapes (UNION, FILTER, paths) swap in `@comunica/query-sparql`.
*/
import { DataFactory, Parser, Store, Writer } from 'n3';
import type { Quad, Quad_Graph, Quad_Object, Quad_Predicate, Quad_Subject, Term } from 'n3';
const { namedNode, literal, quad, defaultGraph } = DataFactory;
const LOCALSTORAGE_KEY = 'dagonizer-archivist-memory';
export const DAG_NS = 'https://noocodex.dev/ontology/dagonizer/';
export const BOOK_NS = 'urn:dagonizer:book:';
export const RUN_NS = 'urn:dagonizer:run:';
/** Named-graph IRIs reserved by the Archivist demo. */
export const GRAPH_ONTOLOGY = namedNode('urn:dagonizer:ontology');
export const GRAPH_MEMORY = namedNode('urn:dagonizer:memory');
export const STATE_GRAPH_PREFIX = 'urn:dagonizer:state:';
export const PROV_GRAPH_PREFIX = 'urn:dagonizer:prov:';
export const stateGraphIri = (runId: string): Term => namedNode(`${STATE_GRAPH_PREFIX}${runId}`);
export const provGraphIri = (runId: string): Term => namedNode(`${PROV_GRAPH_PREFIX}${runId}`);
/**
* One bound row from `select()`. Keys are pattern variable names without
* the leading `?`. Values are the raw n3 terms (NamedNode | Literal | …).
*/
export type Binding = Readonly<Record<string, Term>>;
interface SlotPattern {
readonly subject?: Term | string;
readonly predicate?: Term | string;
readonly object?: Term | string;
readonly graph?: Term | string;
}
export class MemoryStore {
readonly #store = new Store();
/** Auto-persist writes to localStorage when true (browser only). */
#persist = false;
/**
* Hydrate from localStorage and enable auto-persistence. Safe to call
* in Node (no-ops) since we check for `localStorage`.
*/
enablePersistence(): void {
if (typeof localStorage === 'undefined') return;
this.#persist = true;
const dump = localStorage.getItem(LOCALSTORAGE_KEY);
if (dump === null || dump.length === 0) return;
try {
const parser = new Parser({ 'format': 'N-Quads' });
const quads = parser.parse(dump);
for (const q of quads) this.#store.addQuad(q);
} catch {
localStorage.removeItem(LOCALSTORAGE_KEY);
}
}
/**
* Disable auto-persistence and remove the stored dump from localStorage.
* Subsequent writes are held only in memory until `enablePersistence()` is
* called again. Safe to call in Node (no-ops).
*/
disablePersistence(): void {
this.#persist = false;
if (typeof localStorage !== 'undefined') {
localStorage.removeItem(LOCALSTORAGE_KEY);
}
}
/** True when writes are being auto-persisted to localStorage. */
get isPersisted(): boolean { return this.#persist; }
/** Total quad count — useful for the live UI counter. */
get size(): number { return this.#store.size; }
/** Pre-bake a named-node IRI for the `dag:` vocabulary. */
static dagIri(local: string): Term { return namedNode(`${DAG_NS}${local}`); }
/** Pre-bake a named-node IRI for a candidate book by ISBN. */
static bookIri(isbn: string): Term { return namedNode(`${BOOK_NS}${isbn}`); }
/** Per-run subject IRI. */
static runIri(id: string): Term { return namedNode(`${RUN_NS}${id}`); }
/** Make any IRI. */
static iri(value: string): Term { return namedNode(value); }
/** Literal helpers — typed XSD where it matters for SPARQL FILTER. */
static lit = {
str(value: string): Term { return literal(value); },
num(value: number): Term { return literal(String(value), namedNode('http://www.w3.org/2001/XMLSchema#double')); },
int(value: number): Term { return literal(String(value), namedNode('http://www.w3.org/2001/XMLSchema#integer')); },
bool(value: boolean): Term { return literal(String(value), namedNode('http://www.w3.org/2001/XMLSchema#boolean')); },
dateTime(value: Date): Term { return literal(value.toISOString(), namedNode('http://www.w3.org/2001/XMLSchema#dateTime')); },
};
/**
* Load the TBox ontology into `urn:dagonizer:ontology`.
*
* Accepts the `ONTOLOGY_NTRIPLES` array from `ArchivistOntology.ts`.
* Idempotent: clears the graph before writing so repeated calls on
* mount are safe. The `typeof` guard lets tests supply any string[].
*/
loadOntology(ntriples: readonly string[]): void {
this.#store.removeQuads(this.#store.getQuads(null, null, null, GRAPH_ONTOLOGY));
const parser = new Parser({ 'format': 'N-Triples' });
const joined = ntriples.join('\n');
const parsed = parser.parse(joined);
for (const q of parsed) {
this.#store.addQuad(
quad(q.subject, q.predicate, q.object, GRAPH_ONTOLOGY),
);
}
this.#flush();
}
/** Write one quad. `graph` defaults to the default graph. */
assert(s: Term, p: Term, o: Term, graph?: Term): void {
this.#store.addQuad(quad(
s as Quad_Subject,
p as Quad_Predicate,
o as Quad_Object,
(graph ?? defaultGraph()) as Quad_Graph,
));
this.#flush();
}
/** Write many quads. Each quad carries its own graph. */
assertAll(quads: readonly Quad[]): void {
for (const q of quads) this.#store.addQuad(q);
this.#flush();
}
/** ASK — true when at least one quad matches the pattern. */
ask(pattern: SlotPattern): boolean {
return this.#store.getQuads(
asTerm(pattern.subject) ?? null,
asTerm(pattern.predicate) ?? null,
asTerm(pattern.object) ?? null,
asTerm(pattern.graph) ?? null,
).length > 0;
}
/**
* SELECT — list bound rows. Variables: pass a string `?name` in any
* slot and it becomes a binding key; concrete terms filter.
*/
select(pattern: SlotPattern): Binding[] {
const subject = asTerm(pattern.subject) ?? null;
const predicate = asTerm(pattern.predicate) ?? null;
const object = asTerm(pattern.object) ?? null;
const graph = asTerm(pattern.graph) ?? null;
const quads = this.#store.getQuads(subject, predicate, object, graph);
return quads.map((q) => {
const row: Record<string, Term> = {};
if (isVar(pattern.subject)) row[stripQuestion(pattern.subject)] = q.subject;
if (isVar(pattern.predicate)) row[stripQuestion(pattern.predicate)] = q.predicate;
if (isVar(pattern.object)) row[stripQuestion(pattern.object)] = q.object;
if (isVar(pattern.graph)) row[stripQuestion(pattern.graph)] = q.graph;
return row;
});
}
/** Count matching quads. */
count(pattern: SlotPattern): number {
return this.#store.getQuads(
asTerm(pattern.subject) ?? null,
asTerm(pattern.predicate) ?? null,
asTerm(pattern.object) ?? null,
asTerm(pattern.graph) ?? null,
).length;
}
/** Empty the entire store and the persisted dump. */
clear(): void {
this.#store.removeQuads(this.#store.getQuads(null, null, null, null));
if (this.#persist && typeof localStorage !== 'undefined') {
localStorage.removeItem(LOCALSTORAGE_KEY);
}
}
/** Drop every quad in one named graph (useful when a run resets). */
clearGraph(graph: Term): void {
this.#store.removeQuads(this.#store.getQuads(null, null, null, graph));
this.#flush();
}
/**
* Drop every quad in `urn:dagonizer:memory` whose subject is typed as
* `dag:Book` (i.e. has a `rdf:type dag:Book` triple). Safe to call
* before re-seeding so the library stays idempotent across reloads.
*/
clearBooks(): void {
const rdfType = namedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type');
const dagBook = namedNode(`${DAG_NS}Book`);
// Collect all book subject IRIs in GRAPH_MEMORY.
const bookSubjects = this.#store
.getQuads(null, rdfType, dagBook, GRAPH_MEMORY)
.map((q) => q.subject.value);
// Remove every quad whose subject is one of those book IRIs.
for (const subjectValue of bookSubjects) {
const subject = namedNode(subjectValue);
this.#store.removeQuads(this.#store.getQuads(subject, null, null, GRAPH_MEMORY));
}
this.#flush();
}
/** Write the current store to localStorage as N-Quads. */
#flush(): void {
if (!this.#persist || typeof localStorage === 'undefined') return;
const writer = new Writer({ 'format': 'N-Quads' });
writer.addQuads(this.#store.getQuads(null, null, null, null));
writer.end((err, result) => {
if (err === null || err === undefined) {
localStorage.setItem(LOCALSTORAGE_KEY, result);
}
});
}
/** Iterate every quad in every graph. */
*triples(): IterableIterator<Quad> {
for (const q of this.#store.getQuads(null, null, null, null)) yield q;
}
/** Iterate every quad in a single named graph. */
*triplesIn(graph: Term): IterableIterator<Quad> {
for (const q of this.#store.getQuads(null, null, null, graph)) yield q;
}
/** Distinct graph IRIs the store currently knows about. */
graphs(): readonly Term[] {
const seen = new Map<string, Term>();
for (const q of this.#store.getQuads(null, null, null, null)) {
if (q.graph.termType === 'DefaultGraph') continue;
if (!seen.has(q.graph.value)) seen.set(q.graph.value, q.graph);
}
return [...seen.values()];
}
}
function isVar(slot: Term | string | undefined): slot is string {
return typeof slot === 'string' && slot.startsWith('?');
}
function stripQuestion(name: string): string {
return name.startsWith('?') ? name.slice(1) : name;
}
function asTerm(slot: Term | string | undefined): Term | null {
if (slot === undefined) return null;
if (isVar(slot)) return null;
if (typeof slot === 'string') return null;
return slot;
}Ontology (TBox + ABox)
/**
* ArchivistOntology — TBox (schema) for the Archivist's RDF memory.
*
* Defines the class and property vocabulary under the `dag:` namespace
* (`https://noocodex.dev/ontology/dagonizer/`). Every ABox write in
* `recordFindings.ts` and `StateProjection.ts` uses these same IRIs so
* SPARQL queries span the TBox (`urn:dagonizer:ontology`) and ABox
* (`urn:dagonizer:memory`, `urn:dagonizer:state:<runId>`) uniformly.
*
* Exported surfaces:
* - `ArchivistOntologyJsonLd` — canonical JSON-LD document (docs / tooling)
* - `ONTOLOGY_NTRIPLES` — N-Triples ready to load via `MemoryStore.loadOntology()`
*
* Classes (7):
* dag:Book, dag:Author, dag:Subject, dag:Run, dag:Activity,
* dag:Source, dag:Score
*
* Object properties (7):
* dag:hasAuthor, dag:hasSubject, dag:fromSource, dag:queriedIn,
* dag:shortlisted, dag:about, dag:publishedBy
*
* Datatype properties (9):
* dag:title, dag:isbn, dag:summary, dag:firstPublishYear,
* dag:rating, dag:score, dag:visitorQuery, dag:runTimestamp, dag:inShortlist
*
* Cross-source query surface — with TBox + ABox co-loaded:
* • JOIN on dag:title across catalog, web-search, wiki records (same predicate)
* • Enumerate all books from a Run: ?run dag:candidate ?book
* • Rank by score across sources: ?book dag:score ?s ORDER BY DESC(?s)
* • Trace lineage: ?run dag:queriedIn / dag:fromSource ?src
* • Schema reflection: ask what class/domain/range a predicate has
*/
/** @internal Namespace abbreviation. */
const DAG = 'https://noocodex.dev/ontology/dagonizer/';
const RDFS = 'http://www.w3.org/2000/01/rdf-schema#';
const OWL = 'http://www.w3.org/2002/07/owl#';
const XSD = 'http://www.w3.org/2001/XMLSchema#';
const PROV = 'http://www.w3.org/ns/prov#';
// ── JSON-LD context ─────────────────────────────────────────────────────────
const CONTEXT = {
'@vocab': DAG,
'dag': DAG,
'rdfs': RDFS,
'owl': OWL,
'xsd': XSD,
'prov': PROV,
'subClassOf': { '@id': `${RDFS}subClassOf`, '@type': '@id' },
'domain': { '@id': `${RDFS}domain`, '@type': '@id' },
'range': { '@id': `${RDFS}range`, '@type': '@id' },
'label': { '@id': `${RDFS}label`, '@language': 'en' },
'comment': { '@id': `${RDFS}comment`, '@language': 'en' },
'type': '@type',
'Class': `${OWL}Class`,
'ObjectProperty': `${OWL}ObjectProperty`,
'DatatypeProperty': `${OWL}DatatypeProperty`,
'Ontology': `${OWL}Ontology`,
};
// ── JSON-LD document ────────────────────────────────────────────────────────
/** Canonical JSON-LD ontology document. Use for tooling, docs, and exports. */
export const ArchivistOntologyJsonLd: Record<string, unknown> = {
'@context': CONTEXT,
'@graph': [
// Ontology header
{
'@id': `${DAG}`,
'type': 'Ontology',
'label': 'Dagonizer Archivist Ontology',
'comment': 'TBox vocabulary for the Archivist demo RDF memory store',
},
// ── Classes ────────────────────────────────────────────────────────────
{
'@id': `${DAG}Book`,
'type': 'Class',
'label': 'Book',
'comment': 'A bibliographic record — catalog entry, web-search result, or wiki article.',
},
{
'@id': `${DAG}Author`,
'type': 'Class',
'label': 'Author',
'comment': 'A person or organisation responsible for a Book.',
},
{
'@id': `${DAG}Subject`,
'type': 'Class',
'label': 'Subject',
'comment': 'A thematic topic or classification applied to a Book.',
},
{
'@id': `${DAG}Run`,
'type': 'Class',
'label': 'Run',
'comment': 'One top-level Archivist execution, keyed by runId.',
'subClassOf': `${PROV}Activity`,
},
{
'@id': `${DAG}Activity`,
'type': 'Class',
'label': 'Activity',
'comment': 'An Archivist-domain prov:Activity (node execution, tool call, LLM call).',
'subClassOf': `${PROV}Activity`,
},
{
'@id': `${DAG}Source`,
'type': 'Class',
'label': 'Source',
'comment': 'A data source from which Book records are fetched (catalog, web, wiki, reviews).',
},
{
'@id': `${DAG}Score`,
'type': 'Class',
'label': 'Score',
'comment': 'A ranked relevance score in [0, 1] assigned to a Book by the ranking node.',
},
// ── Object properties ──────────────────────────────────────────────────
{
'@id': `${DAG}hasAuthor`,
'type': 'ObjectProperty',
'label': 'hasAuthor',
'comment': 'Relates a Book to an Author.',
'domain': `${DAG}Book`,
'range': `${DAG}Author`,
},
{
'@id': `${DAG}hasSubject`,
'type': 'ObjectProperty',
'label': 'hasSubject',
'comment': 'Relates a Book to a Subject.',
'domain': `${DAG}Book`,
'range': `${DAG}Subject`,
},
{
'@id': `${DAG}fromSource`,
'type': 'ObjectProperty',
'label': 'fromSource',
'comment': 'Relates a Book record to the Source it was retrieved from.',
'domain': `${DAG}Book`,
'range': `${DAG}Source`,
},
{
'@id': `${DAG}queriedIn`,
'type': 'ObjectProperty',
'label': 'queriedIn',
'comment': 'Relates a Source to the Run it was consulted in.',
'domain': `${DAG}Source`,
'range': `${DAG}Run`,
},
{
'@id': `${DAG}shortlisted`,
'type': 'ObjectProperty',
'label': 'shortlisted',
'comment': 'Relates a Run to a Book that was placed on the shortlist.',
'domain': `${DAG}Run`,
'range': `${DAG}Book`,
},
{
'@id': `${DAG}about`,
'type': 'ObjectProperty',
'label': 'about',
'comment': 'Relates a Book to a Subject it is about.',
'domain': `${DAG}Book`,
'range': `${DAG}Subject`,
},
{
'@id': `${DAG}publishedBy`,
'type': 'ObjectProperty',
'label': 'publishedBy',
'comment': 'Relates a Book to its publisher (as a named node or literal).',
'domain': `${DAG}Book`,
},
// ── Datatype properties ────────────────────────────────────────────────
{
'@id': `${DAG}title`,
'type': 'DatatypeProperty',
'label': 'title',
'comment': 'Human-readable title of a Book.',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}isbn`,
'type': 'DatatypeProperty',
'label': 'isbn',
'comment': 'ISBN-13, ISBN-10, or opaque source key identifying a Book.',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}summary`,
'type': 'DatatypeProperty',
'label': 'summary',
'comment': 'Editorial description or summary of a Book.',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}firstPublishYear`,
'type': 'DatatypeProperty',
'label': 'firstPublishYear',
'comment': 'Year the Book was first published.',
'domain': `${DAG}Book`,
'range': `${XSD}integer`,
},
{
'@id': `${DAG}rating`,
'type': 'DatatypeProperty',
'label': 'rating',
'comment': 'Reader rating of a Book in [0, 5].',
'domain': `${DAG}Book`,
'range': `${XSD}double`,
},
{
'@id': `${DAG}score`,
'type': 'DatatypeProperty',
'label': 'score',
'comment': 'Relevance score in [0, 1] assigned to a Book for a given query.',
'domain': `${DAG}Book`,
'range': `${XSD}double`,
},
{
'@id': `${DAG}visitorQuery`,
'type': 'DatatypeProperty',
'label': 'visitorQuery',
'comment': 'Raw question string submitted by the visitor in a Run.',
'domain': `${DAG}Run`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}runTimestamp`,
'type': 'DatatypeProperty',
'label': 'runTimestamp',
'comment': 'Unix timestamp (ms) when the Run was recorded.',
'domain': `${DAG}Run`,
'range': `${XSD}double`,
},
{
'@id': `${DAG}inShortlist`,
'type': 'DatatypeProperty',
'label': 'inShortlist',
'comment': 'True when a Book was selected onto the shortlist for the current Run.',
'domain': `${DAG}Book`,
'range': `${XSD}boolean`,
},
{
'@id': `${DAG}source`,
'type': 'DatatypeProperty',
'label': 'source',
'comment': 'String identifier of the source a Book record was retrieved from (e.g. "web-search").',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}author`,
'type': 'DatatypeProperty',
'label': 'author',
'comment': 'String name of an author of a Book (literal form of hasAuthor).',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}subject`,
'type': 'DatatypeProperty',
'label': 'subject',
'comment': 'String label of a subject/topic of a Book (literal form of hasSubject).',
'domain': `${DAG}Book`,
'range': `${XSD}string`,
},
{
'@id': `${DAG}candidate`,
'type': 'ObjectProperty',
'label': 'candidate',
'comment': 'Relates a Run to a Book that was a candidate in that run.',
'domain': `${DAG}Run`,
'range': `${DAG}Book`,
},
{
'@id': `${DAG}shortlistedTitle`,
'type': 'DatatypeProperty',
'label': 'shortlistedTitle',
'comment': 'Title string of a Book shortlisted in a Run (literal convenience predicate).',
'domain': `${DAG}Run`,
'range': `${XSD}string`,
},
],
};
// ── N-Triples serialisation ─────────────────────────────────────────────────
//
// Pre-baked so `MemoryStore.loadOntology()` can parse them without a
// JSON-LD library. Generated once from the JSON-LD graph above; kept
// in sync manually (or via build tooling) since the ontology is stable.
function iri(s: string): string { return `<${s}>`; }
function lit(s: string): string { return `"${s.replace(/\\/g, '\\\\').replace(/"/g, '\\"').replace(/\n/g, '\\n')}"@en`; }
function triple(s: string, p: string, o: string): string {
return `${iri(s)} ${iri(p)} ${iri(o)} .`;
}
function tripleL(s: string, p: string, o: string): string {
return `${iri(s)} ${iri(p)} ${lit(o)} .`;
}
const RDF_TYPE = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type';
const OWL_CLASS = `${OWL}Class`;
const OWL_OP = `${OWL}ObjectProperty`;
const OWL_DP = `${OWL}DatatypeProperty`;
const RDFS_SUB = `${RDFS}subClassOf`;
const RDFS_DOMAIN = `${RDFS}domain`;
const RDFS_RANGE = `${RDFS}range`;
const RDFS_LABEL = `${RDFS}label`;
const RDFS_COMMENT = `${RDFS}comment`;
/** N-Triple strings ready to load into the ontology named graph. */
export const ONTOLOGY_NTRIPLES: readonly string[] = [
// ── Classes
triple(`${DAG}Book`, RDF_TYPE, OWL_CLASS),
tripleL(`${DAG}Book`, RDFS_LABEL, 'Book'),
tripleL(`${DAG}Book`, RDFS_COMMENT, 'A bibliographic record — catalog entry, web-search result, or wiki article.'),
triple(`${DAG}Author`, RDF_TYPE, OWL_CLASS),
tripleL(`${DAG}Author`, RDFS_LABEL, 'Author'),
tripleL(`${DAG}Author`, RDFS_COMMENT, 'A person or organisation responsible for a Book.'),
triple(`${DAG}Subject`, RDF_TYPE, OWL_CLASS),
tripleL(`${DAG}Subject`, RDFS_LABEL, 'Subject'),
tripleL(`${DAG}Subject`, RDFS_COMMENT, 'A thematic topic or classification applied to a Book.'),
triple(`${DAG}Run`, RDF_TYPE, OWL_CLASS),
triple(`${DAG}Run`, RDFS_SUB, `${PROV}Activity`),
tripleL(`${DAG}Run`, RDFS_LABEL, 'Run'),
tripleL(`${DAG}Run`, RDFS_COMMENT, 'One top-level Archivist execution, keyed by runId.'),
triple(`${DAG}Activity`, RDF_TYPE, OWL_CLASS),
triple(`${DAG}Activity`, RDFS_SUB, `${PROV}Activity`),
tripleL(`${DAG}Activity`, RDFS_LABEL, 'Activity'),
tripleL(`${DAG}Activity`, RDFS_COMMENT, 'An Archivist-domain prov:Activity (node execution, tool call, LLM call).'),
triple(`${DAG}Source`, RDF_TYPE, OWL_CLASS),
tripleL(`${DAG}Source`, RDFS_LABEL, 'Source'),
tripleL(`${DAG}Source`, RDFS_COMMENT, 'A data source from which Book records are fetched (catalog, web, wiki, reviews).'),
triple(`${DAG}Score`, RDF_TYPE, OWL_CLASS),
tripleL(`${DAG}Score`, RDFS_LABEL, 'Score'),
tripleL(`${DAG}Score`, RDFS_COMMENT, 'A ranked relevance score in [0, 1] assigned to a Book by the ranking node.'),
// ── Object properties
triple(`${DAG}hasAuthor`, RDF_TYPE, OWL_OP),
triple(`${DAG}hasAuthor`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}hasAuthor`, RDFS_RANGE, `${DAG}Author`),
tripleL(`${DAG}hasAuthor`, RDFS_LABEL, 'hasAuthor'),
triple(`${DAG}hasSubject`, RDF_TYPE, OWL_OP),
triple(`${DAG}hasSubject`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}hasSubject`, RDFS_RANGE, `${DAG}Subject`),
tripleL(`${DAG}hasSubject`, RDFS_LABEL, 'hasSubject'),
triple(`${DAG}fromSource`, RDF_TYPE, OWL_OP),
triple(`${DAG}fromSource`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}fromSource`, RDFS_RANGE, `${DAG}Source`),
tripleL(`${DAG}fromSource`, RDFS_LABEL, 'fromSource'),
triple(`${DAG}queriedIn`, RDF_TYPE, OWL_OP),
triple(`${DAG}queriedIn`, RDFS_DOMAIN, `${DAG}Source`),
triple(`${DAG}queriedIn`, RDFS_RANGE, `${DAG}Run`),
tripleL(`${DAG}queriedIn`, RDFS_LABEL, 'queriedIn'),
triple(`${DAG}shortlisted`, RDF_TYPE, OWL_OP),
triple(`${DAG}shortlisted`, RDFS_DOMAIN, `${DAG}Run`),
triple(`${DAG}shortlisted`, RDFS_RANGE, `${DAG}Book`),
tripleL(`${DAG}shortlisted`, RDFS_LABEL, 'shortlisted'),
triple(`${DAG}about`, RDF_TYPE, OWL_OP),
triple(`${DAG}about`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}about`, RDFS_RANGE, `${DAG}Subject`),
tripleL(`${DAG}about`, RDFS_LABEL, 'about'),
triple(`${DAG}publishedBy`, RDF_TYPE, OWL_OP),
triple(`${DAG}publishedBy`, RDFS_DOMAIN, `${DAG}Book`),
tripleL(`${DAG}publishedBy`, RDFS_LABEL, 'publishedBy'),
// ── Datatype properties
triple(`${DAG}title`, RDF_TYPE, OWL_DP),
triple(`${DAG}title`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}title`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}title`, RDFS_LABEL, 'title'),
triple(`${DAG}isbn`, RDF_TYPE, OWL_DP),
triple(`${DAG}isbn`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}isbn`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}isbn`, RDFS_LABEL, 'isbn'),
triple(`${DAG}summary`, RDF_TYPE, OWL_DP),
triple(`${DAG}summary`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}summary`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}summary`, RDFS_LABEL, 'summary'),
triple(`${DAG}firstPublishYear`, RDF_TYPE, OWL_DP),
triple(`${DAG}firstPublishYear`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}firstPublishYear`, RDFS_RANGE, `${XSD}integer`),
tripleL(`${DAG}firstPublishYear`, RDFS_LABEL, 'firstPublishYear'),
triple(`${DAG}rating`, RDF_TYPE, OWL_DP),
triple(`${DAG}rating`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}rating`, RDFS_RANGE, `${XSD}double`),
tripleL(`${DAG}rating`, RDFS_LABEL, 'rating'),
triple(`${DAG}score`, RDF_TYPE, OWL_DP),
triple(`${DAG}score`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}score`, RDFS_RANGE, `${XSD}double`),
tripleL(`${DAG}score`, RDFS_LABEL, 'score'),
triple(`${DAG}visitorQuery`, RDF_TYPE, OWL_DP),
triple(`${DAG}visitorQuery`, RDFS_DOMAIN, `${DAG}Run`),
triple(`${DAG}visitorQuery`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}visitorQuery`, RDFS_LABEL, 'visitorQuery'),
triple(`${DAG}runTimestamp`, RDF_TYPE, OWL_DP),
triple(`${DAG}runTimestamp`, RDFS_DOMAIN, `${DAG}Run`),
triple(`${DAG}runTimestamp`, RDFS_RANGE, `${XSD}double`),
tripleL(`${DAG}runTimestamp`, RDFS_LABEL, 'runTimestamp'),
triple(`${DAG}inShortlist`, RDF_TYPE, OWL_DP),
triple(`${DAG}inShortlist`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}inShortlist`, RDFS_RANGE, `${XSD}boolean`),
tripleL(`${DAG}inShortlist`, RDFS_LABEL, 'inShortlist'),
triple(`${DAG}source`, RDF_TYPE, OWL_DP),
triple(`${DAG}source`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}source`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}source`, RDFS_LABEL, 'source'),
triple(`${DAG}author`, RDF_TYPE, OWL_DP),
triple(`${DAG}author`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}author`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}author`, RDFS_LABEL, 'author'),
triple(`${DAG}subject`, RDF_TYPE, OWL_DP),
triple(`${DAG}subject`, RDFS_DOMAIN, `${DAG}Book`),
triple(`${DAG}subject`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}subject`, RDFS_LABEL, 'subject'),
triple(`${DAG}candidate`, RDF_TYPE, OWL_OP),
triple(`${DAG}candidate`, RDFS_DOMAIN, `${DAG}Run`),
triple(`${DAG}candidate`, RDFS_RANGE, `${DAG}Book`),
tripleL(`${DAG}candidate`, RDFS_LABEL, 'candidate'),
triple(`${DAG}shortlistedTitle`, RDF_TYPE, OWL_DP),
triple(`${DAG}shortlistedTitle`, RDFS_DOMAIN, `${DAG}Run`),
triple(`${DAG}shortlistedTitle`, RDFS_RANGE, `${XSD}string`),
tripleL(`${DAG}shortlistedTitle`, RDFS_LABEL, 'shortlistedTitle'),
];