# LLM Structured Outputs Constraining a [[Large Language Models (LLMs)|LLM]]'s output to conform to a specific format — typically a JSON Schema, regular expression, or grammar. Eliminates parsing ambiguity, prevents malformed responses, and makes LLMs reliable as components in larger systems. ## How It Works The runtime intercepts the model's token sampling step. At each step: 1. The model proposes a probability distribution over the next token 2. The constrained-decoding layer masks tokens that would violate the schema 3. The highest-probability allowed token is chosen 4. Process repeats until the constraint is fully satisfied This is sometimes called constrained decoding or guided generation. Common implementations: `outlines`, `guidance`, `lm-format-enforcer`, OpenAI's structured outputs mode. ## What You Can Constrain - **JSON Schema**: most common; full type system, required fields, enums, regex patterns - **Regex**: simple format constraints (phone numbers, dates, IDs) - **Grammar (BNF/EBNF)**: arbitrary formal languages (SQL, custom DSLs) - **Choice**: pick from a fixed set of strings ## Why It Matters Without structured outputs: - LLM may emit Markdown around the JSON - Property names may be misspelled - Required fields may be omitted - Numeric values may be quoted - Apps need defensive parsing and fallbacks With structured outputs: parse with confidence. The LLM cannot produce invalid output — by construction. ## Where It Shows Up | API | Mechanism | |---|---| | OpenAI API | `response_format: { type: "json_schema", schema }` | | Anthropic Claude API | Tool use with strict schemas | | W3C [[Prompt API]] | `responseConstraint` (JSON Schema or regex) | | [[Gemini Nano]] | Supported via Prompt API | | Local runtimes (llama.cpp, vLLM) | Grammar-based constrained decoding | ## Trade-offs - **Latency**: slight overhead from token-level masking - **Quality**: very tight constraints can degrade reasoning ("schema-pushed" outputs) - **Schema complexity**: deeply nested or recursive schemas may not be supported by all runtimes ## Relationship to Tool Calling [[LLM Tool Calling]] is a special case of structured outputs: the schema is the union of registered tool signatures plus their parameters. Many runtimes share the same constrained-decoding implementation for both. ## References - https://github.com/webmachinelearning/prompt-api - https://platform.openai.com/docs/guides/structured-outputs ## Related - [[Large Language Models (LLMs)]] - [[LLM Tool Calling]] - [[LLM Streaming]] - [[Prompt API]] - [[Gemini Nano]] - [[Browser-Provided Language Models]] - [[Constrained Decoding]]