# Durable Execution
Durable execution is a programming model where a long-running process persists its state and progress so it can survive crashes, restarts, and infrastructure failures, then resume from where it left off rather than starting over. Originally popularized for workflow engines (e.g. Temporal-style systems), it has become a core requirement for [[AI Agents]] that run for minutes or hours across many steps.
## Why it matters for agents
Agentic work is long-horizon and failure-prone: an agent may run thousands of tool calls, hit rate limits, or have its server restart mid-task. Without durability, any interruption loses the entire run. Durable execution makes each step checkpointed and replayable, so an agent can recover its session, tool results, and intermediate reasoning and continue, essential for reliability and for not paying twice for the same tokens.
## Common mechanisms
- Checkpointing state to a database after each step
- Durable streams that automatically resume interrupted work
- Deterministic replay of completed steps so only unfinished work re-runs
- Idempotent tool calls to avoid duplicate side effects
## Where it shows up
Agent frameworks increasingly build durability in as a first-class feature, for example [[Flue]] (durable streams) and [[Mastra Harness]]. It is one of the dividing lines between a toy agent loop and a production-grade agent.
## Related
- [[AI Agents]]
- [[Agentic Engineering]]
- [[Flue]]
- [[Mastra Harness]]
- [[Artificial Intelligence (AI)]]