{"slug":"durable-workflows-are-here","url":"https://tako.sh/blog/durable-workflows-are-here/","canonical":"https://tako.sh/blog/durable-workflows-are-here/","title":"Durable Workflows Are Here","date":"2026-04-16T00:29","description":"Tako now ships a durable workflow engine — step checkpoints, retries, cron, sleep for days, and signal/waitFor — on your own VPS, no external queue service.","author":null,"image":"1d73331964b0","imageAlt":null,"headings":[{"depth":2,"slug":"step-checkpoints-that-survive-crashes","text":"Step checkpoints that survive crashes"},{"depth":2,"slug":"sleep-for-days-wait-for-signals","text":"Sleep for days, wait for signals"},{"depth":2,"slug":"cron-without-the-cron-box","text":"Cron, without the cron box"},{"depth":2,"slug":"how-its-wired","text":"How it’s wired"},{"depth":2,"slug":"same-server-same-deploy","text":"Same server, same deploy"}],"markdown":"Every app eventually needs background work. Send an email after signup. Reindex a document when it changes. Charge a card, notify a webhook, fan out to three services, wait for a human to approve. That work doesn't belong in the HTTP path — it needs retries, scheduling, and progress that survives the process restarting mid-flight.\n\nThe usual answer is another service. Inngest, Temporal, BullMQ on top of Redis, SQS and a Lambda, a cron entry on some random box. One more vendor, one more bill, one more thing to keep alive at 3am.\n\nTako now ships this as a built-in primitive. A full durable workflow engine runs next to your app — same server, same config, same deploy — and the SDK gives you `step.run`, `step.sleep`, `step.waitFor`, and cron out of the box.\n\n## Step checkpoints that survive crashes\n\nThe core idea is `step.run` — wrap a side effect, give it a name, and Tako persists its return value. If the worker crashes or restarts, the next attempt skips completed steps and resumes at the first unfinished one:\n\n```ts\n// workflows/fulfill-order.ts\nimport { defineWorkflow } from \"tako.sh\";\n\nexport default defineWorkflow(\"fulfill-order\", {\n  retries: 4,\n  handler: async (payload, step) => {\n    const charge = await step.run(\"charge\", () =>\n      stripe.charges.create({ amount: payload.total, source: payload.token }),\n    );\n    const label = await step.run(\"ship\", () => easypost.shipments.create({ to: payload.address }));\n    await step.run(\"email\", () => mailer.send(payload.email, { charge, tracking: label.id }));\n  },\n});\n```\n\nEach step is one row in a per-app SQLite queue at `{tako_data_dir}/apps/<app>/runs.db` with first-write-wins semantics. Retries are automatic — exponential backoff with jitter, capped at an hour, overridable per workflow (`retries: 4` means retry 4 times = 5 total attempts). The same contract every durable engine gives you: at-least-once, so make step bodies idempotent. See [the SPEC](/docs) for the full details.\n\n## Sleep for days, wait for signals\n\nTwo primitives turn \"workflow\" into \"long-running business process.\"\n\n`step.sleep(3 * 24 * 3600 * 1000)` pauses the run for three days. Short waits run inline; longer ones park the run — the worker exits, the row goes back to `pending` with a wake-up time, and the supervisor resumes on schedule. Crash-safe across reboots.\n\n`step.waitFor(name, { timeout })` parks the run waiting for a named event, then anywhere else in your code, `signal(name, payload)` wakes it:\n\n```ts\n// Worker — block the run until approval arrives\nexport default defineWorkflow(\"approve-order\", {\n  handler: async (payload, step) => {\n    const decision = await step.waitFor(`approval:order-${payload.id}`, {\n      timeout: 7 * 24 * 3600 * 1000,\n    });\n    if (decision === null) step.bail(\"approval timed out\");\n  },\n});\n\n// Elsewhere — an HTTP handler, webhook, or another workflow\nimport { signal } from \"tako.sh\";\nawait signal(`approval:order-abc`, { approved: true });\n```\n\nHuman approvals, webhook callbacks, multi-day onboarding nudges — all expressed as ordinary async code.\n\n## Cron, without the cron box\n\nPass `schedule` to `defineWorkflow`. Tako runs a leader-elected ticker that enqueues on schedule, deduplicated so a brief outage doesn't double-fire:\n\n```ts\nexport default defineWorkflow(\"daily-job\", {\n  schedule: \"0 9 * * *\",\n  handler: async (payload, step) => {\n    // daily job body\n  },\n}); // 9am daily\n```\n\n## How it's wired\n\n```d2\ndirection: right\n\nenq: Enqueue {style.fill: \"#9BC4B6\"; style.font-size: 16}\nserver: tako-server {style.fill: \"#E88783\"; style.font-size: 16}\ndb: runs.db {style.fill: \"#FFF9F4\"; style.stroke: \"#2F2A44\"; style.font-size: 16}\nworker: Worker process {style.fill: \"#E88783\"; style.font-size: 16}\n\nenq -> server: \"unix socket\"\nserver -> db: \"insert run\"\nserver -> worker: \"supervise\"\nworker -> server: \"claim / save step / complete\"\nserver -> db: \"persist\"\n```\n\nThe worker is a separate process so heavy deps — image libs, ML bindings — don't bloat your HTTP instances. Workers default to scale-to-zero: [same idea as your app](/blog/scale-to-zero-without-containers), the first enqueue or cron tick spins one up, an idle worker exits after five minutes. One knob in `tako.toml` pins them up:\n\n```toml\n[workflows]\nworkers = 1\nconcurrency = 10\n```\n\n## Same server, same deploy\n\nWorkflows ship with your app. No external queue to provision, no extra auth tokens, no network hop to a SaaS. Your handlers live in `workflows/*.ts`, they get [secrets on fd 3](/blog/secrets-without-env-files) like your HTTP code, they [deploy](/blog/what-happens-when-you-run-tako-deploy) with everything else, and they keep running across rolling updates.\n\nRun `tako init`, drop a file into `workflows/`, `tako dev` boots the worker in-process for unified logs, and `tako deploy` sends the whole thing to your servers. Check [the docs](/docs/tako-toml) for the full config surface, or the [CLI reference](/docs/cli) for the commands.\n\nDurable is finally just a keyword."}