# Semantic chunking
Semantic chunking is the process of splitting documents into meaningful segments for [[Retrieval-Augmented Generation (RAG)]] and [[Semantic Search]]. Instead of splitting text at arbitrary boundaries (every 500 tokens, every paragraph), semantic chunking splits at natural meaning boundaries so each chunk is a coherent unit of information.
The quality of chunks directly determines retrieval quality, which determines output quality. Bad chunks (mid-sentence splits, mixed topics in one chunk) produce bad retrieval, which produces bad answers. This is a practical instance of [[AI context is finite with diminishing returns]]: wasting context on poorly chunked, irrelevant fragments degrades everything downstream.
Chunking strategies:
- **Fixed-size**: split every N tokens. Simple but often breaks meaning
- **Recursive**: split by paragraphs, then sentences, then characters as needed
- **Semantic**: use embedding similarity to detect topic boundaries. Adjacent sentences with low similarity signal a natural break point
- **Document-aware**: respect document structure (headers, sections, code blocks) as chunk boundaries
- **Agentic**: let an LLM decide where to split based on content understanding
The trade-off is chunk size. Small chunks improve retrieval precision (find exactly the right passage) but lose surrounding context. Large chunks preserve context but may include irrelevant information. The right size depends on the use case and the [[Context Window]] budget available for retrieved content.
## References
-
## Related
- [[Retrieval-Augmented Generation (RAG)]]
- [[RAG Pipelines]]
- [[Semantic Search]]
- [[Embeddings]]
- [[Vector Store]]
- [[AI Retrieval Patterns]]
- [[AI context is finite with diminishing returns]]
- [[Context Engineering]]