# DSPy DSPy is Stanford NLP's open-source framework for building [[Large Language Models (LLMs)|LLM]] applications as **compositional Python programs** rather than hand-tuned prompts. The acronym stands for *Declarative Self-improving Python*. The thesis: prompt engineering is brittle; declare what you want as code, let an optimizer figure out the prompts. DSPy is the substrate that makes "self-improving agent" a concrete engineering pattern instead of a marketing claim. ## Core abstractions - **Signatures**; declarative type-like contracts for an LLM call (`question -> answer`, `document -> summary`). - **Modules**; compositional building blocks that wrap a signature with a strategy (Chain-of-Thought, ReAct, etc.). Modules compose into pipelines. - **Optimizers**; algorithms that improve modules automatically by tuning prompts, demonstrations, or weights against a metric. The interesting layer. - **Metrics**; the objective the optimizer optimizes against; can be exact match, LLM-as-judge, custom Python. ## Optimizers worth knowing - **BootstrapFewShot**; auto-generates in-context demonstrations from a small training set; the standard starting point. - **MIPRO**; more advanced; jointly optimizes instructions and demonstrations. - **[[GEPA]]** (July 2025); reflective prompt evolution; reads execution traces and mutates programs through targeted text changes; reportedly outperforms RL-style approaches on prompt-shaped problems. ## What it is good for - **RAG pipelines**; declare retrieval + grounded generation as composable modules; let the optimizer tune them. - **Agent loops with self-improvement**; the sweet spot. Combine DSPy modules with an optimizer like GEPA and an execution-trace pipeline (see [[Hermes Agent Self-Evolution]]) and you have a closed improvement loop without GPU training. - **Pipelines beyond manual tuning capacity**; once a pipeline has more than ~3 prompts, hand-tuning all of them stops working; DSPy starts to dominate. ## What it is not - A drop-in replacement for LangChain or LlamaIndex; the philosophy is different. DSPy treats the LLM call as the optimization target, not the orchestration target. - A no-code tool; you write Python. - Free of the "metric is a Python function" tax; bad metrics give bad optimizers; the metric is the spec. ## Position in the agent stack DSPy sits below the harness and above the model. A harness like [[Claude Code]] or [[Hermes Agent]] runs the loop; DSPy specifies *what each LLM call should be*. The optimizer iterates on the latter without changing the former. The Nous Research [[Hermes Agent Self-Evolution]] project is the most public example of DSPy + GEPA running over real agent execution traces, producing PR-able improvements to skills and prompts. ## License MIT. ## References - Repository: https://github.com/stanfordnlp/dspy - Documentation: https://dspy.ai/ ## Related - [[GEPA]] - [[Hermes Agent Self-Evolution]] - [[Hermes Agent]] - [[Large Language Models (LLMs)]] - [[Prompt Engineering]] - [[AI Agent Skills]] - [[Python]] - [[Atropos]]