Phase 08 · Checkpoint + resume
The compose / validate loop in The Archivist is the most expensive segment — multiple LLM calls per attempt. If the visitor's session times out mid-loop, the dispatcher records the cursor (crl-compose-response or crl-validate-response), the partial draft, and the attempt counter. A later process recalls the checkpoint and finishes the response without paying for the upstream scouts again.
The ArchivistState makes this possible by overriding snapshotData() and restoreData() — the two methods NodeStateBase calls during Checkpoint.from and Checkpoint.recall.
Flow
Code
State snapshot round-trip
The #snapshot-restore region covers snapshotData() and restoreData() — the two methods that serialize and rehydrate the domain fields (query, intent, terms, candidates, shortlist, draft, approved, attempts, recalledContext, memoryDigest):
protected override snapshotData(): JsonObject {
return {
"query": this.query,
"intent": this.intent,
"terms": [...this.terms],
"candidates": this.candidates.map((candidate) => ({
"book": { ...candidate.book, "authors": [...candidate.book.authors] },
"score": candidate.score,
"source": candidate.source,
})) as unknown as JsonObject[],
"shortlist": this.shortlist.map((candidate) => ({
"book": { ...candidate.book, "authors": [...candidate.book.authors] },
"score": candidate.score,
"source": candidate.source,
})) as unknown as JsonObject[],
"draft": this.draft,
"approved": this.approved,
"attempts": { ...this.attempts },
"failureCause": this.failureCause,
"recalledContext": {
"priorIntents": this.recalledContext.priorIntents as unknown as JsonObject[],
"recentCandidates": this.recalledContext.recentCandidates.map((c) => ({
"book": { ...c.book, "authors": [...c.book.authors] },
"score": c.score,
"source": c.source,
})) as unknown as JsonObject[],
"similarPriorQueries": this.recalledContext.similarPriorQueries as unknown as JsonObject[],
"summary": this.recalledContext.summary,
},
"memoryDigest": {
"bookCount": this.memoryDigest.bookCount,
"queryCount": this.memoryDigest.queryCount,
"recentBooks": this.memoryDigest.recentBooks as unknown as JsonObject[],
"intentBreakdown": this.memoryDigest.intentBreakdown as unknown as JsonObject[],
"summary": this.memoryDigest.summary,
},
};
}
protected override restoreData(snap: JsonObject): void {
if (typeof snap['query'] === 'string') this.query = snap['query'];
if (typeof snap['intent'] === 'string') this.intent = snap['intent'] as ArchivistIntent;
if (typeof snap['draft'] === 'string') this.draft = snap['draft'];
if (typeof snap['approved'] === 'boolean') this.approved = snap['approved'];
if (typeof snap['failureCause'] === 'string') this.failureCause = snap['failureCause'];
if (Array.isArray(snap['terms'])) this.terms = snap['terms'] as string[];
if (Array.isArray(snap['candidates'])) this.candidates = snap['candidates'] as unknown as Candidate[];
if (Array.isArray(snap['shortlist'])) this.shortlist = snap['shortlist'] as unknown as Candidate[];
if (snap['attempts'] && typeof snap['attempts'] === 'object') {
this.attempts = { ...snap['attempts'] as Record<string, number> };
}
const rc = snap['recalledContext'];
if (rc !== null && rc !== undefined && typeof rc === 'object' && !Array.isArray(rc)) {
const rcObj = rc as Record<string, unknown>;
this.recalledContext = {
'priorIntents': Array.isArray(rcObj['priorIntents']) ? rcObj['priorIntents'] as RecalledContext['priorIntents'] : [],
'recentCandidates': Array.isArray(rcObj['recentCandidates']) ? rcObj['recentCandidates'] as RecalledContext['recentCandidates'] : [],
'similarPriorQueries': Array.isArray(rcObj['similarPriorQueries']) ? rcObj['similarPriorQueries'] as RecalledContext['similarPriorQueries'] : [],
'summary': typeof rcObj['summary'] === 'string' ? rcObj['summary'] : '',
};
}
const md = snap['memoryDigest'];
if (md !== null && md !== undefined && typeof md === 'object' && !Array.isArray(md)) {
const mdObj = md as Record<string, unknown>;
this.memoryDigest = {
'bookCount': typeof mdObj['bookCount'] === 'number' ? mdObj['bookCount'] : 0,
'queryCount': typeof mdObj['queryCount'] === 'number' ? mdObj['queryCount'] : 0,
'recentBooks': Array.isArray(mdObj['recentBooks']) ? mdObj['recentBooks'] as MemoryDigest['recentBooks'] : [],
'intentBreakdown': Array.isArray(mdObj['intentBreakdown']) ? mdObj['intentBreakdown'] as MemoryDigest['intentBreakdown'] : [],
'summary': typeof mdObj['summary'] === 'string' ? mdObj['summary'] : '',
};
}
}Cancellation → checkpoint → resume
The #cancellation-run region in the runner shows the execute call with signal and deadlineMs, the cursor check after cancellation, and how to read the lifecycle kind:
// Caller-driven cancellation — the visitor closes the page.
const controller = new AbortController();
// Simulate visitor abandoning 800 ms in.
setTimeout(() => controller.abort('visitor closed page'), 800);
const cancelVisitor = new ArchivistState();
cancelVisitor.query = "What's a book about a labyrinth?";
const cancelResult = await dispatcher.execute('the-archivist', cancelVisitor, {
'signal': controller.signal,
'deadlineMs': 5000, // hard 5s ceiling regardless of signal
});
const lc = cancelResult.state.lifecycle;
switch (lc.kind) {
case 'completed':
logger.result(`responded: ${cancelResult.state.draft}`);
break;
case 'cancelled':
logger.result(`visitor abandoned at: ${lc.reason}`);
break;
case 'timed_out':
logger.result(`hit deadline at: ${lc.finishedAt}`);
break;
}
// result.cursor is the next node that would have run — pass it to
// Checkpoint.from to persist and resume in a later process.
if (cancelResult.cursor !== null) {
logger.result(`stopped at ${cancelResult.cursor} — resumable`);
}Persist and resume (illustrative)
The persist and resume calls below use the standard Checkpoint API with MemoryCheckpointStore — swap to any CheckpointStore implementation (Postgres, Redis, S3) without changing the calling code:
// illustrative — runtime equivalent in examples/the-archivist/runArchivist.ts
import { Checkpoint, MemoryCheckpointStore } from '@noocodex/dagonizer/checkpoint';
const store = new MemoryCheckpointStore();
// After a cancelled/timed-out execute call:
if (result.cursor !== null) {
const data = Checkpoint.from('the-archivist', result);
await Checkpoint.persist(store, `archivist:${result.state.query}`, data);
}
// In a later process:
const recalled = await Checkpoint.recall(
store,
`archivist:${visitor.query}`,
(snap) => ArchivistState.restore(snap), // rehydrates via restoreData()
);
if (recalled !== null) {
const final = await dispatcher.resume(
recalled.dagName,
recalled.state,
recalled.cursor, // 'crl-validate-response'
);
console.log(final.state.draft); // validated response
console.log(final.state.lifecycle.kind); // 'completed'
}What it demonstrates
ArchivistState.snapshotData()/restoreData()— domain-specific serialization.NodeStateBasecallssnapshotDataduringCheckpoint.fromandrestoreDataduringCheckpoint.recall. The lifecycle resets topendingon restore; the resumed execution is a fresh lifecycle run on the recovered state data.Checkpoint.from(dagName, result)— produces aCheckpointDatarecord only whenresult.cursor !== null(an in-progress flow). A completed flow produces no cursor.CheckpointStoreadapter contract —MemoryCheckpointStoreis the test-time implementation. Swap to Postgres / Redis / S3 without touching the dispatcher or state.Checkpoint.persist/Checkpoint.recall— codec + store in one call per side.Checkpoint.recallreturnsnullwhen nothing is stored under the key.dispatcher.resume(dagName, state, cursor)— starts from the recalled cursor instead of the DAG's entrypoint. The compose/validate retry counter (state.attempts.compose) survives the round-trip so the loop is still bounded.
See this in action in the Archivist live demo.