# Hermes Agent Self-Evolution
Hermes Agent Self-Evolution is the optimization companion to [[Hermes Agent]]; an open-source system that automatically improves the agent's skills, tool descriptions, system prompts, and code via [[DSPy]] + [[GEPA]]. It is one of the few production examples of an agent harness that ships with a documented self-improvement loop, not just a tool registry.
It matters because it codifies a reference architecture: how to take execution traces from a deployed agent, mutate parts of the agent definition (prompts, skills, descriptions), evaluate the variants, and merge the winners back via guarded pull requests. The loop runs on API calls, not GPU training.
## The loop
```
1. Read execution traces from real Hermes Agent sessions (or synthetic data)
2. Generate evaluation datasets from the current skill set
3. Propose candidate variants (mutated prompts, descriptions, code)
4. Evaluate variants against held-out execution traces
5. Apply constraint gates (tests, size limits, benchmarks, semantics)
6. Select winners; raise PR for human review
7. After merge, the improved Hermes Agent generates new traces; loop continues
```
The whole thing is text-mutation plus evaluation; no model retraining. That is what makes it reproducible by anyone running an agent, not just frontier labs.
## Why DSPy + GEPA
- **[[DSPy]]** provides the framework; agents and skills as compositional Python programs whose prompts and parameters are *automatically* optimized rather than hand-tuned.
- **[[GEPA]]** is the optimizer that reads execution traces, identifies failure causes, proposes targeted text mutations, and selects winners via reflective evolution. It outperforms reinforcement-learning-style optimizers on prompt-shaped problems and is dramatically cheaper.
Together; one framework defines the program, the other evolves it. Without both halves, the loop does not close.
## Usage
Install the package, point at a Hermes Agent repository, run optimization commands per skill:
```sh
python -m evolution.skills.evolve_skill \
--skill github-code-review \
--iterations 10
```
Synthetic data can drive the evaluation set, but real session history (including from [[Claude Code]]) produces the most useful variants. The optimization scope is per skill; the system does not mutate the whole agent at once.
## Guardrails
- All evolved variants must pass the **full test suite**.
- **Size limits** prevent prompt bloat masquerading as improvement.
- **Caching compatibility** preserved; mutations cannot break prompt-cache hits.
- **Semantic preservation** check; mutations must keep the intent of the original.
- **Human review** before deployment; PR gate is non-negotiable.
These exist because self-mutating agents are agents whose behavior drifts. The loop is opt-in for a reason.
## Why it matters beyond Hermes
The pattern is portable. Any agent with a defined skill or capability surface, execution traces, and an evaluation harness can run the same loop. For a personal AI assistant stack like the one this vault is built around (skills + agents + memory), the mapping is direct:
- Vault skills → DSPy modules.
- Daily-note traces → execution data.
- Skill audits / panel reviews → evaluation harness.
- The PR step → manual review of proposed skill changes.
It is the closest thing to a reference architecture for **operationalized continuous improvement of an agent system** without owning a training cluster.
## References
- Repository: https://github.com/NousResearch/hermes-agent-self-evolution
- DSPy framework: https://github.com/stanfordnlp/dspy
## Related
- [[Hermes Agent]]
- [[Nous Research]]
- [[DSPy]]
- [[GEPA]]
- [[AI Agent Skills]]
- [[Atropos]]
- [[AI Agent Harness]]
- [[Claude Code]]
- [[AI Agent Skills in Chrome]]