Claude Managed Agents Outcomes

# Claude Managed Agents Outcomes **Research preview.** An **outcome** elevates a [[Claude Managed Agents|managed agent]] session from *conversation* to *work*. You declare what the deliverable must look like via a markdown rubric; the harness provisions a separate **grader** with its own context window that scores the artifact against each criterion; the agent iterates until the grader is satisfied or iterations run out. ## Why it's different Normal sessions stop whenever the agent decides it's done. Outcome sessions have an **external judge** that blocks the session from exiting until rubric criteria are met. The grader's separate context window prevents the main agent's reasoning from contaminating the evaluation — closer to a real QA pass than to self-review. ## Rubric shape Markdown with explicit, independently-gradeable criteria. Vague criteria produce noisy scores — write testable bullets, not prose. ```markdown # DCF Model Rubric ## Revenue Projections - Uses historical revenue from last 5 fiscal years - Projects revenue at least 5 years forward - Growth assumptions explicitly stated and reasonable ## Output Quality - Single .xlsx file with clearly labeled sheets - Assumptions on a separate "Assumptions" sheet - Sensitivity analysis on WACC and terminal growth ``` Pro tip: feed Claude a known-good artifact and ask it to derive the rubric rather than writing criteria from scratch. Rubrics can be inline (`{"type": "text", "content": "..."}`) or uploaded via the Files API for reuse (`{"type": "file", "file_id": "file_..."}`; needs `files-api-2025-04-14` beta header too). ## Kicking off an outcome Create the session, then send a single `user.define_outcome` event. No additional `user.message` needed — the agent starts immediately. ```python session = client.beta.sessions.create( agent=agent.id, environment_id=environment.id, title="Financial analysis on Costco", ) client.beta.sessions.events.send( session_id=session.id, events=[{ "type": "user.define_outcome", "description": "Build a DCF model for Costco in .xlsx", "rubric": {"type": "text", "content": RUBRIC}, "max_iterations": 5, # default 3, max 20 }], ) ``` ## The loop 1. Agent works on the artifact 2. Grader evaluates against rubric → `span.outcome_evaluation_end` 3. Outcome is one of: - `satisfied` — session goes idle, done - `needs_revision` — grader hands back specific gaps; agent starts next iteration - `max_iterations_reached` — one final revision then idle - `failed` — rubric contradicts description; session goes idle - `interrupted` — you fired `user.interrupt` mid-evaluation ## Monitoring - `span.outcome_evaluation_start` — grader started - `span.outcome_evaluation_ongoing` — heartbeat (internal reasoning is opaque) - `span.outcome_evaluation_end` — result + explanation + per-criterion breakdown - Or poll `GET /v1/sessions/:id` and inspect `outcome_evaluations[].result` ## Chaining outcomes Only one outcome can be active at a time; when it terminates, send another `user.define_outcome` to queue the next one. Session history carries across. ## Retrieving deliverables Agent writes outputs to `/mnt/session/outputs/`. Once idle, fetch them via the Files API scoped to the session: ```bash curl ".../v1/files?scope_id=$session_id" \ -H "anthropic-beta: files-api-2025-04-14,managed-agents-2026-04-01-research-preview" ``` ## Access Research preview — `managed-agents-2026-04-01-research-preview` beta header and [[Claude Managed Agents|form access]] required. ## References - https://platform.claude.com/docs/en/managed-agents/define-outcomes ## Related - [[Claude Managed Agents]] - [[Claude Managed Agents Events]]