# Claude Managed Agents Outcomes
**Research preview.** An **outcome** elevates a [[Claude Managed Agents|managed agent]] session from *conversation* to *work*. You declare what the deliverable must look like via a markdown rubric; the harness provisions a separate **grader** with its own context window that scores the artifact against each criterion; the agent iterates until the grader is satisfied or iterations run out.
## Why it's different
Normal sessions stop whenever the agent decides it's done. Outcome sessions have an **external judge** that blocks the session from exiting until rubric criteria are met. The grader's separate context window prevents the main agent's reasoning from contaminating the evaluation — closer to a real QA pass than to self-review.
## Rubric shape
Markdown with explicit, independently-gradeable criteria. Vague criteria produce noisy scores — write testable bullets, not prose.
```markdown
# DCF Model Rubric
## Revenue Projections
- Uses historical revenue from last 5 fiscal years
- Projects revenue at least 5 years forward
- Growth assumptions explicitly stated and reasonable
## Output Quality
- Single .xlsx file with clearly labeled sheets
- Assumptions on a separate "Assumptions" sheet
- Sensitivity analysis on WACC and terminal growth
```
Pro tip: feed Claude a known-good artifact and ask it to derive the rubric rather than writing criteria from scratch.
Rubrics can be inline (`{"type": "text", "content": "..."}`) or uploaded via the Files API for reuse (`{"type": "file", "file_id": "file_..."}`; needs `files-api-2025-04-14` beta header too).
## Kicking off an outcome
Create the session, then send a single `user.define_outcome` event. No additional `user.message` needed — the agent starts immediately.
```python
session = client.beta.sessions.create(
agent=agent.id, environment_id=environment.id,
title="Financial analysis on Costco",
)
client.beta.sessions.events.send(
session_id=session.id,
events=[{
"type": "user.define_outcome",
"description": "Build a DCF model for Costco in .xlsx",
"rubric": {"type": "text", "content": RUBRIC},
"max_iterations": 5, # default 3, max 20
}],
)
```
## The loop
1. Agent works on the artifact
2. Grader evaluates against rubric → `span.outcome_evaluation_end`
3. Outcome is one of:
- `satisfied` — session goes idle, done
- `needs_revision` — grader hands back specific gaps; agent starts next iteration
- `max_iterations_reached` — one final revision then idle
- `failed` — rubric contradicts description; session goes idle
- `interrupted` — you fired `user.interrupt` mid-evaluation
## Monitoring
- `span.outcome_evaluation_start` — grader started
- `span.outcome_evaluation_ongoing` — heartbeat (internal reasoning is opaque)
- `span.outcome_evaluation_end` — result + explanation + per-criterion breakdown
- Or poll `GET /v1/sessions/:id` and inspect `outcome_evaluations[].result`
## Chaining outcomes
Only one outcome can be active at a time; when it terminates, send another `user.define_outcome` to queue the next one. Session history carries across.
## Retrieving deliverables
Agent writes outputs to `/mnt/session/outputs/`. Once idle, fetch them via the Files API scoped to the session:
```bash
curl ".../v1/files?scope_id=$session_id" \
-H "anthropic-beta: files-api-2025-04-14,managed-agents-2026-04-01-research-preview"
```
## Access
Research preview — `managed-agents-2026-04-01-research-preview` beta header and [[Claude Managed Agents|form access]] required.
## References
- https://platform.claude.com/docs/en/managed-agents/define-outcomes
## Related
- [[Claude Managed Agents]]
- [[Claude Managed Agents Events]]