# DSPy
DSPy is Stanford NLP's open-source framework for building [[Large Language Models (LLMs)|LLM]] applications as **compositional Python programs** rather than hand-tuned prompts. The acronym stands for *Declarative Self-improving Python*.
The thesis: prompt engineering is brittle; declare what you want as code, let an optimizer figure out the prompts. DSPy is the substrate that makes "self-improving agent" a concrete engineering pattern instead of a marketing claim.
## Core abstractions
- **Signatures**; declarative type-like contracts for an LLM call (`question -> answer`, `document -> summary`).
- **Modules**; compositional building blocks that wrap a signature with a strategy (Chain-of-Thought, ReAct, etc.). Modules compose into pipelines.
- **Optimizers**; algorithms that improve modules automatically by tuning prompts, demonstrations, or weights against a metric. The interesting layer.
- **Metrics**; the objective the optimizer optimizes against; can be exact match, LLM-as-judge, custom Python.
## Optimizers worth knowing
- **BootstrapFewShot**; auto-generates in-context demonstrations from a small training set; the standard starting point.
- **MIPRO**; more advanced; jointly optimizes instructions and demonstrations.
- **[[GEPA]]** (July 2025); reflective prompt evolution; reads execution traces and mutates programs through targeted text changes; reportedly outperforms RL-style approaches on prompt-shaped problems.
## What it is good for
- **RAG pipelines**; declare retrieval + grounded generation as composable modules; let the optimizer tune them.
- **Agent loops with self-improvement**; the sweet spot. Combine DSPy modules with an optimizer like GEPA and an execution-trace pipeline (see [[Hermes Agent Self-Evolution]]) and you have a closed improvement loop without GPU training.
- **Pipelines beyond manual tuning capacity**; once a pipeline has more than ~3 prompts, hand-tuning all of them stops working; DSPy starts to dominate.
## What it is not
- A drop-in replacement for LangChain or LlamaIndex; the philosophy is different. DSPy treats the LLM call as the optimization target, not the orchestration target.
- A no-code tool; you write Python.
- Free of the "metric is a Python function" tax; bad metrics give bad optimizers; the metric is the spec.
## Position in the agent stack
DSPy sits below the harness and above the model. A harness like [[Claude Code]] or [[Hermes Agent]] runs the loop; DSPy specifies *what each LLM call should be*. The optimizer iterates on the latter without changing the former.
The Nous Research [[Hermes Agent Self-Evolution]] project is the most public example of DSPy + GEPA running over real agent execution traces, producing PR-able improvements to skills and prompts.
## License
MIT.
## References
- Repository: https://github.com/stanfordnlp/dspy
- Documentation: https://dspy.ai/
## Related
- [[GEPA]]
- [[Hermes Agent Self-Evolution]]
- [[Hermes Agent]]
- [[Large Language Models (LLMs)]]
- [[Prompt Engineering]]
- [[AI Agent Skills]]
- [[Python]]
- [[Atropos]]