# GEPA
GEPA (Genetic-Pareto / Reflective Prompt Evolution) is a [[DSPy]] optimizer published in July 2025. It optimizes LLM-driven programs by reading execution traces, reflecting on what went wrong, proposing targeted text mutations to prompts and program parts, and selecting winners on a Pareto frontier of objectives.
The headline result; on prompt-shaped problems, GEPA reportedly outperforms reinforcement-learning-style optimizers (including GRPO-class methods) at a fraction of the compute, because text mutations evaluated through API calls are cheap compared to backprop on weights.
## The loop
1. Run the program; collect execution traces (inputs, intermediate steps, final outputs, scores).
2. **Reflect**; an LLM analyzes the traces to identify likely failure causes.
3. **Mutate**; propose targeted text edits to prompts, instructions, or signatures.
4. **Evaluate** the mutated variants on a held-out set.
5. Keep variants that Pareto-dominate the previous frontier on the objectives.
6. Repeat.
No gradient descent. No GPUs. Just structured reflection plus evaluation, repeated.
## Why it matters
- **Compute decoupling**; agent behavior improves through text changes, not weight changes. That makes self-improvement accessible without training infrastructure.
- **Auditability**; mutations are diffs over text. They can be reviewed, gated, and rolled back; weight updates cannot.
- **Composability with harnesses**; any agent that emits structured execution traces can pipe them into GEPA-driven optimization. See [[Hermes Agent Self-Evolution]] for the production reference implementation.
## Where it sits
GEPA is one optimizer among several in [[DSPy]] (alongside BootstrapFewShot, MIPRO). It is the right choice when:
- The program is prompt-shaped (instructions and demonstrations dominate behavior).
- Execution traces are available.
- Compute budget is API-call-limited rather than GPU-limited.
It is the wrong choice when the bottleneck is in model weights (small models that need fine-tuning to reach a capability) or when the objective is hard to score via LLM judges.
## References
- DSPy framework: https://github.com/stanfordnlp/dspy
- Reference implementation in agent self-improvement: https://github.com/NousResearch/hermes-agent-self-evolution
## Related
- [[DSPy]]
- [[Hermes Agent Self-Evolution]]
- [[Hermes Agent]]
- [[Prompt Engineering]]
- [[AI Agent Skills]]
- [[Reinforcement Learning From Human Feedback (RLHF)]]