MuZero - DeveloPassion

# MuZero MuZero (2019, published 2020) is the [[Google DeepMind]] system that extended [[AlphaZero]] to domains where **the rules of the environment are not provided**. It learns an internal model of the dynamics and reward structure directly from interaction, then plans inside that learned model. ## What was new - **No perfect simulator required**: AlphaZero needed the rules of Go/chess/shogi to search; MuZero learns them - **Unified across game and non-game domains**: matched AlphaZero's superhuman play in Go, chess, and shogi, and also achieved state-of-the-art on Atari (where the "rules" are opaque pixels) - **Model learned in a latent / abstract space**: MuZero does not try to reconstruct raw observations, only to predict the quantities relevant for planning (value, policy, reward) ## Why it matters - Closes the gap between classical planning (requires a model) and model-free deep RL (scales but can't plan long-horizon) — you can have both - Planning over learned world models has since become a mainstay design pattern in AI agent research - Direct conceptual ancestor of later "Dreamer"-style world-model agents and of reasoning systems that combine LLM-style priors with learned-model lookahead ## References - Nature paper (2020): https://www.nature.com/articles/s41586-020-03051-4 - Preprint (2019): https://arxiv.org/abs/1911.08265 - DeepMind blog: https://deepmind.google/discover/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules/ ## Related - [[AlphaZero]] - [[AlphaGo]] - [[Google DeepMind]] - [[Artificial Intelligence (AI)]]