# GEPA GEPA (Genetic-Pareto / Reflective Prompt Evolution) is a [[DSPy]] optimizer published in July 2025. It optimizes LLM-driven programs by reading execution traces, reflecting on what went wrong, proposing targeted text mutations to prompts and program parts, and selecting winners on a Pareto frontier of objectives. The headline result; on prompt-shaped problems, GEPA reportedly outperforms reinforcement-learning-style optimizers (including GRPO-class methods) at a fraction of the compute, because text mutations evaluated through API calls are cheap compared to backprop on weights. ## The loop 1. Run the program; collect execution traces (inputs, intermediate steps, final outputs, scores). 2. **Reflect**; an LLM analyzes the traces to identify likely failure causes. 3. **Mutate**; propose targeted text edits to prompts, instructions, or signatures. 4. **Evaluate** the mutated variants on a held-out set. 5. Keep variants that Pareto-dominate the previous frontier on the objectives. 6. Repeat. No gradient descent. No GPUs. Just structured reflection plus evaluation, repeated. ## Why it matters - **Compute decoupling**; agent behavior improves through text changes, not weight changes. That makes self-improvement accessible without training infrastructure. - **Auditability**; mutations are diffs over text. They can be reviewed, gated, and rolled back; weight updates cannot. - **Composability with harnesses**; any agent that emits structured execution traces can pipe them into GEPA-driven optimization. See [[Hermes Agent Self-Evolution]] for the production reference implementation. ## Where it sits GEPA is one optimizer among several in [[DSPy]] (alongside BootstrapFewShot, MIPRO). It is the right choice when: - The program is prompt-shaped (instructions and demonstrations dominate behavior). - Execution traces are available. - Compute budget is API-call-limited rather than GPU-limited. It is the wrong choice when the bottleneck is in model weights (small models that need fine-tuning to reach a capability) or when the objective is hard to score via LLM judges. ## References - DSPy framework: https://github.com/stanfordnlp/dspy - Reference implementation in agent self-improvement: https://github.com/NousResearch/hermes-agent-self-evolution ## Related - [[DSPy]] - [[Hermes Agent Self-Evolution]] - [[Hermes Agent]] - [[Prompt Engineering]] - [[AI Agent Skills]] - [[Reinforcement Learning From Human Feedback (RLHF)]]