How to checkpoint Claude Agent SDK sessions
To make a Claude Agent SDK task durable with Tidebase, wrap each query() turn in a run.step(), store the SDK session id in run state, and wire the SDK’s canUseTool callback to a durable Tidebase gate. A crashed task re-invoked with the same runId replays finished turns from Postgres and resumes the SDK session where it left off.
import { Tidebase } from '@tidebase/sdk'
import { query } from '@anthropic-ai/claude-agent-sdk'
const tide = new Tidebase()
await tide.run('triage-task', { runId }, async (run, input) => {
let sessionId: string | undefined
const turn = await run.step('turn-1', { input: { prompt: input.prompt } }, async () => {
let text = ''
for await (const message of query({
prompt: input.prompt,
options: {
canUseTool: async (toolName, toolInput) => {
if (toolName !== 'Bash') return { behavior: 'allow', updatedInput: toolInput }
const decision = await run.gate(`approve:${toolName}`, {
prompt: `Agent wants to run: ${JSON.stringify(toolInput)}`,
data: { toolName, toolInput }
})
return decision.decision === 'approved'
? { behavior: 'allow', updatedInput: toolInput }
: { behavior: 'deny', message: 'Denied by operator' }
}
}
})) {
if (message.type === 'system' && message.subtype === 'init') sessionId = message.session_id
if (message.type === 'result') text = message.result ?? ''
}
return { text, sessionId }
})
await run.state.set({ sessionId: turn.sessionId, lastTurn: 'turn-1' })
return turn.text
})
Tidebase is an open-source checkpoint layer for AI agents: wrap your steps, and failed runs resume from the last safe point — in your own Postgres, without moving execution into a new runtime.
The honest tradeoff: Tidebase does not execute your code — something (a Tidebase queue worker, a recovery webhook handler, your own retry) must re-invoke the workflow after a failure. And the Agent SDK already persists sessions locally; Tidebase does not replace its session files. What it adds is the durable run record in your own Postgres — which turns are done, what they returned, what was approved by whom, what it cost — plus safe replay of the steps around agent turns.
The headline pattern: durable approvals via canUseTool
The Agent SDK asks your code for permission before each tool call. By default that’s an ephemeral, in-process decision; if you answer from a terminal prompt, the answer dies with the process. Routing it through run.gate(...) makes it a durable, exactly-once approval gate: the decision parks in Postgres, can be resolved from Studio, a Slack webhook channel, or your product UI, survives restarts, and is recorded with the actor who approved it.
One honest caveat: run.gate() blocks until a human resolves it. The SDK is happy to wait, but if approvals routinely take a long time, set timeoutMs on the gate so an abandoned run fails loudly instead of hanging forever.
Resuming: two layers, one id each
There are two resumable things here, and they resume independently:
- The Tidebase run — re-invoke with the same
runId. Completedturn-*steps replay from checkpoints without re-running the agent. - The SDK session — the session id captured from the
initmessage and stored withrun.state.set(...). A resumed workflow reads it back and passesoptions.resumetoquery()so the next turn continues the same conversation instead of starting cold.
For multi-turn tasks, give each turn its own step (turn-1, turn-2, …) with the turn’s prompt as step input — if you change a prompt, the stale checkpoint is rejected loudly per the replay contract instead of replaying an answer to a question you no longer asked.
Record what each turn cost
The SDK’s result message carries usage and cost for the turn. Record it inside the step so a replayed turn doesn’t double-count:
if (message.type === 'result') {
await run.usage.record({
kind: 'llm',
provider: 'anthropic',
model: 'claude-sonnet-4-6',
inputTokens: message.usage?.input_tokens ?? 0,
outputTokens: message.usage?.output_tokens ?? 0,
costUsd: message.total_cost_usd
})
}
That gives you a per-run cost ledger across every agent task — see tracking LLM token costs per run.
What Tidebase does not do here
- It does not run the agent. The Agent SDK owns the loop, the tools, and the session files; Tidebase checkpoints around turns.
- It does not proxy Anthropic calls. Your keys, your network path; Tidebase stores what you record.
- Alpha, opt-in auth. Self-hosted alpha — set
TIDEBASE_API_KEYbefore exposing the server beyond localhost.
Repo: https://github.com/BlueprintLabIO/tidebase · See also: Human approval gates for AI agents