# MuZero
MuZero (2019, published 2020) is the [[Google DeepMind]] system that extended [[AlphaZero]] to domains where **the rules of the environment are not provided**. It learns an internal model of the dynamics and reward structure directly from interaction, then plans inside that learned model.
## What was new
- **No perfect simulator required**: AlphaZero needed the rules of Go/chess/shogi to search; MuZero learns them
- **Unified across game and non-game domains**: matched AlphaZero's superhuman play in Go, chess, and shogi, and also achieved state-of-the-art on Atari (where the "rules" are opaque pixels)
- **Model learned in a latent / abstract space**: MuZero does not try to reconstruct raw observations, only to predict the quantities relevant for planning (value, policy, reward)
## Why it matters
- Closes the gap between classical planning (requires a model) and model-free deep RL (scales but can't plan long-horizon) — you can have both
- Planning over learned world models has since become a mainstay design pattern in AI agent research
- Direct conceptual ancestor of later "Dreamer"-style world-model agents and of reasoning systems that combine LLM-style priors with learned-model lookahead
## References
- Nature paper (2020): https://www.nature.com/articles/s41586-020-03051-4
- Preprint (2019): https://arxiv.org/abs/1911.08265
- DeepMind blog: https://deepmind.google/discover/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules/
## Related
- [[AlphaZero]]
- [[AlphaGo]]
- [[Google DeepMind]]
- [[Artificial Intelligence (AI)]]