# Loop Engineering ## What it is Loop engineering is the discipline of designing the system that drives an autonomous agent, instead of prompting it step by step. Simon Willison's definition of an LLM agent is sharp: "something that runs tools in a loop to achieve a goal." The art, he says, is carefully designing the tools and the loop. Loop engineering is where that art lives. The shift is about what you spend your time on. Prompt engineering asks: what do I say to the model right now? Loop engineering asks: what is the trigger, what are the tools, what counts as done, and what stops the agent if it isn't? You author the system once. The agent runs. This sits at the top of a layered stack. Prompt engineering is inside context engineering is inside harness engineering is inside loop engineering. You still write prompts; you still curate context. Loop engineering is the layer that puts it all in motion. ## Why it matters now The bottleneck moved. For years, the constraint in AI-assisted work was the model itself: too short a context window, too many reasoning failures, too much recovery needed from the human. That changed around 2025-2026. METR measured it concretely: Claude Opus 4.6 now completes 50% of tasks that take roughly 12 hours, up from roughly 1 hour 40 minutes a year earlier. The model can run long. It can recover from its own mistakes. The question is no longer "can the model do this?" but "have I designed the loop well enough that it will?" The most provocative data point on this comes from Terminal-Bench 2.0. The same model swings 30-50 percentage points on benchmark performance depending on which harness is running it. Claude Code vs. OpenHands vs. a homegrown loop, same underlying model, radically different results. When someone tells you "model X is best for agents," the right question is: which harness? Boris Cherny, who built Claude Code: "I don't prompt Claude anymore. My job is to create loops." Jensen Huang said something similar: "Nobody writes prompts anymore. The new job is to write and handle loops." That is not hype. That is where the leverage is. ## How it works A well-designed loop has five parts: **Trigger** — what starts the loop. This can be a file change, a cron job, a human command, an API event, or the completion of another loop. The trigger determines scope and timing. **Tools** — what the agent can call. Shell commands, file operations, APIs, search, sub-agents. Tool design matters enormously. Willison's framing: it's not just about designing the loop, it's about carefully designing the tools the loop runs. Dangerous or overly broad tools in the loop are a safety problem, not just a design smell. **Goal** — a clear, verifiable target state. "Make the tests pass and the type-checker clean" is a goal. "Help me with the codebase" is not. The goal needs to be testable by something other than the agent's own judgment. **Verifier** — a deterministic check that runs when the agent reports completion. This is the most important part of the loop and the most skipped. The agent stops when it *feels* done. The verifier checks whether it actually is. Anthropic's Claude Code Hooks give you exactly this: a hook that intercepts the agent's exit signal, runs real completion criteria (tests green, coverage threshold met, type-check clean), and reinjects the goal if the criteria are not met. Trust the verifier, never the agent's self-report. **Stopping condition** — separate from the verifier. This fires when the loop should stop regardless of completion: max iterations reached, cost ceiling hit, a specific error type detected, a human-review flag raised. Without an explicit stopping condition, a loop that hits a bad state can spiral expensively. The agent then runs: act, observe, decide, repeat. ReAct (Yao et al., 2022) formalized this structure. Reflexion (Shinn et al., 2023) added self-correction via verbal feedback. The lineage is well-established. ## Recommendations **Write the verifier before the loop.** If you cannot write a deterministic check for "done," your goal is not specific enough. Tighten the goal first. **Read about premature stopping before you ship anything.** This is the central failure mode: the agent halts when its subjective confidence is high, not when the task is actually finished. Every serious loop needs a verifier that can say no. **Scale with subagents in isolated worktrees.** The main loop decomposes the task, spawns subagents in isolated worktrees (each with its own context window, model tier, and permissions), collects results, and decides what to do next. This protects your main context window and lets you route cheap subtasks to cheaper models. **Scope permissions and sandbox before you run.** Willison's warning is worth taking seriously: "An AI agent is an LLM wrecking its environment in a loop." YOLO mode (auto-approve on all shell commands) is where real productivity is and also where the real danger is: bad shell commands, secret exfiltration, the machine used as a proxy for attacks. Define what the loop can touch before you start, not after something goes wrong. **Route models by subtask, not uniformly.** A frontier model for planning and reasoning; a cheaper, faster model for mechanical subtasks. The harness manages this. The goal is accuracy per dollar, not the best model everywhere. **Watch cost.** Errors compound in loops. A bad state in iteration 3 becomes a worse state in iteration 8 if the verifier doesn't catch it. Set cost guards. Log everything. ## Tips and tricks **Stop hooks are the most underused primitive.** Claude Code Hooks let you intercept exits deterministically. A stop hook that rejects agent self-reports and runs your own checks is not a nice-to-have; it's what makes the loop trustworthy. **The harness outweighs the model.** Terminal-Bench 2.0 showed 30-50 point swings. Design the harness first; choose the model second. **Loops only pay off with a strict validation gate.** Without one, you get an agent agreeing with itself on repeat. That is not autonomous work; that is expensive noise. **If human review is your bottleneck, a loop just floods the queue.** Measure where the actual constraint is before adding automation. Loops move throughput; they do not improve quality gates. **Do not let permission creep happen.** Scope at design time. Once an agent has broad shell access, you are trusting every tool call it makes, forever. **Context discipline matters inside the loop.** Longer context windows per subagent are not free. Context engineering (what you put in, what you leave out, what you refresh) is still a skill, just applied at the loop level instead of the prompt level. **The "Brute Squad" framing is useful.** Sourcegraph described agentic coding as brute-force autonomous agents. That is an honest description. Loops are not elegant; they are persistent. The value is iteration speed, not elegance. **plentysun's pattern (Claude Code features):** context discipline plus hooks that force steps. The hook is not optional scaffolding; it is the loop's backbone. ## References ### Foundational - Simon Willison, "Designing agentic loops" (2025-09-30) — https://simonwillison.net/2025/Sep/30/designing-agentic-loops/ - Anthropic, "Building Effective Agents" (2024-12) — https://www.anthropic.com/news/building-effective-agents - Anthropic, "Effective harnesses for long-running agents" — https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents - Anthropic, "Effective context engineering for AI agents" (HN discussion) — https://news.ycombinator.com/item?id=45418251 - ReAct (Yao et al., 2022) — https://arxiv.org/abs/2210.03629 - Reflexion (Shinn et al., 2023) — https://arxiv.org/abs/2303.11366 ### 2026 commentary - Data Science Dojo, "Agentic Loops: From ReAct to Loop Engineering (2026 Guide)" — https://datasciencedojo.com/blog/agentic-loops-explained-from-react-to-loop-engineering-2026-guide/ - bdtechtalks, "Demystifying loop engineering" (2026-06-22) — https://bdtechtalks.com/2026/06/22/ai-loop-engineering/ - Requesty, "Loop Engineering: How to Build AI Agent Loops That Run Themselves" — https://www.requesty.ai/blog/loop-engineering-how-to-build-ai-agent-loops-that-run-themselves - Augment Code, "Agentic Design Patterns (2026 Pattern Catalog)" — https://www.augmentcode.com/guides/agentic-design-patterns - Sourcegraph, "The Brute Squad" — https://sourcegraph.com (Readwise highlight) ### Curated lists - serenakeyitan/awesome-agent-loops — https://github.com/serenakeyitan/awesome-agent-loops - Picrew/awesome-agent-harness — https://github.com/Picrew/awesome-agent-harness - RyanAlberts/best-of-Agent-Harnesses — https://github.com/RyanAlberts/best-of-Agent-Harnesses - ai-boost/awesome-harness-engineering — https://github.com/ai-boost/awesome-harness-engineering ### Tools and repos - earendil-works/pi — https://github.com/earendil-works/pi - snarktank/ralph (the "Ralph loop") — https://github.com/snarktank/ralph - the-open-engine/zeroshot — https://github.com/the-open-engine/zeroshot ### Talks and threads - Louis Bouchard, "Loop Engineering Explained" (YouTube) — https://www.youtube.com/watch?v=NjXIIH9vcv0 - HN: "The unreasonable effectiveness of an LLM agent loop with tool use" — https://news.ycombinator.com/item?id=43998472 - HN: "Designing agentic loops" — https://news.ycombinator.com/item?id=45426680 - Paweł Huryn on X (Cherny / Huang quotes) — https://x.com/PawelHuryn/status/2069315068664197315 - Graham Neubig on X (his agent loop) — https://x.com/gneubig/status/2064011013637234728 ## Related [[Harness Engineering]] · [[Agentic loops]] · [[How Coding Agents Work]] · [[AI Agent Harnesses (MoC)]] · [[Agentic Engineering]] · [[AI Guardrails]] · [[Claude Code]] · [[Feedback Loop]] · [[Levels of AI use]]