# AI Observability
AI observability is the ability to understand what an AI system is doing, why it's doing it, and how well it's performing. It extends traditional software observability (logs, metrics, traces) to AI-specific concerns: prompt inputs, model outputs, token usage, latency, hallucination rates, context quality, and tool call patterns.
Without observability, AI systems are black boxes. You know what went in and what came out, but not why the output was good, bad, or subtly wrong. This makes debugging, optimization, and trust-building nearly impossible.
Key dimensions:
- **Input observability**: what prompts, context, and instructions reached the model
- **Output quality**: tracking hallucination rates, user satisfaction, task completion
- **Cost tracking**: token consumption, API costs, cache hit rates per task type
- **Latency profiling**: where time is spent (model inference, tool execution, retrieval)
- **Context health**: detecting [[Context Drift]], [[Context Bloat]], and [[AI Context Rot]] through automated checks
- **Tool call analysis**: which tools agents use, how often, and failure rates
AI observability connects directly to [[Context Hygiene]]: if you can measure context quality over time, you can detect degradation before it impacts output. It's also the foundation for meaningful [[AI Evaluation]] — you can't evaluate what you can't measure.
In agentic systems, observability becomes more complex because [[AI Subagents]] and [[AI Agent Orchestration]] create multi-step, multi-model workflows where failures can cascade silently.
## References
-
## Related
- [[AI Agents]]
- [[AI Agent Harness]]
- [[Context Hygiene]]
- [[Context Drift]]
- [[AI Context Rot]]
- [[AI Evaluation]]
- [[AI Subagents]]
- [[AI Agent Orchestration]]
- [[Feedback Loop]]
- [[AI Safety]]
- [[Sentry]]
- [[LLM Monitoring]]
- [[LangSmith]]
- [[Langfuse]]
- [[Helicone]]
- [[MLflow]]
- [[Edgee]]