How to add durable checkpoints to Mastra agents
To make Mastra agent calls durable with Tidebase, wrap each agent.generate() call in a run.step() inside a Tidebase workflow. If the process dies mid-pipeline, re-invoking with the same runId replays completed steps from Postgres and continues from the first incomplete one — no agent call runs twice.
import { Tidebase } from '@tidebase/sdk'
import { Agent } from '@mastra/core/agent'
const tide = new Tidebase()
const researcher = new Agent({
name: 'researcher',
instructions: 'Research the topic and return key findings as bullets.',
model: 'openai/gpt-4o-mini', // any model id your Mastra version supports
})
const writer = new Agent({
name: 'writer',
instructions: 'Turn findings into a short publishable post.',
model: 'openai/gpt-4o-mini',
})
await tide.run('research-and-publish', { runId }, async (run, input) => {
const findings = await run.step('research', { input: { topic: input.topic } }, async () => {
const res = await researcher.generate(`Research: ${input.topic}`)
await run.usage.record({
kind: 'llm',
provider: 'openai',
model: 'gpt-4o-mini',
inputTokens: res.usage.inputTokens,
outputTokens: res.usage.outputTokens,
})
return res.text
})
const draft = await run.step('draft', { input: { findings } }, async () => {
const res = await writer.generate(`Write a post from these findings:\n${findings}`)
return res.text
})
await run.state.set({ phase: 'awaiting-approval', draftPreview: draft.slice(0, 200) })
const decision = await run.gate('approve-publish', {
prompt: 'Publish this draft to the blog?',
data: { draft },
})
if (decision.decision !== 'approved') return { published: false }
await run.step(
'publish',
{ input: { draft }, sideEffects: ['cms'], idempotencyKey: `publish-${runId}` },
() => publishToCms(draft)
)
return { published: true }
})
Tidebase is an open-source checkpoint layer for AI agents: wrap your steps, and failed runs resume from the last safe point — in your own Postgres, without moving execution into a new runtime.
Why one step per agent call
Each agent.generate() gets its own step, with the prompt material passed as the step’s input. Tidebase hashes that input: on replay, a completed step returns its checkpointed text without calling the model again, and if the prompt changed since the checkpoint was written, replay fails loudly instead of silently reusing a stale answer. Recording usage inside the step means token costs land in the per-run ledger exactly once — replayed steps don’t re-record.
Tool-using agents
If a Mastra agent has tools that touch the outside world (send email, write to a CRM, charge a card), the step wrapping that generate() call has external side effects. Declare them and pin an idempotencyKey, as in the publish step above. Per the replay contract, a failed step with undeclared side effects is parked for manual_review rather than blindly retried — which is what you want when the tool may have half-fired.
The approve-publish gate is a durable, exactly-once human approval gate: the run parks in Postgres until someone decides, surviving restarts and deploys in between.
Resuming a failed run
Re-invoke the same workflow with the same runId — from a Tidebase queue, a recovery webhook, a cron, or a retry button. research and draft replay from checkpoints; execution continues at the gate or the publish step. The honest tradeoff: Tidebase does not execute your code. Something — Tidebase’s queues, or your own infrastructure — must re-invoke the workflow function; Tidebase guarantees that doing so is safe and that completed steps never repeat.
Do you even need this? Mastra has workflows
Be honest with yourself here. Mastra ships its own workflow engine with suspend/resume and pluggable storage. If your pipeline is a pure Mastra workflow, persisted with Mastra’s storage, running entirely inside Mastra — that may be enough, and adding Tidebase would be a second source of truth you don’t need.
Reach for Tidebase when:
- You want the run record in your own Postgres — run state, approval gates, the replay contract, and the token/cost ledger as queryable rows next to your app’s data, not inside a framework’s storage abstraction.
- You’re framework-agnostic — the same checkpoint layer wraps Mastra agents today and plain SDK calls, LangChain chains, or hand-rolled steps tomorrow, with one operational surface.
- The agent call is one step in a larger pipeline — you’re calling
agent.generate()inside ingestion jobs, ETL, or API handlers that are not Mastra workflows, and you want durability around the whole pipeline, not just the agent part.
Repo: https://github.com/BlueprintLabIO/tidebase · See also: How to resume a failed AI agent run