AlphaZero - DeveloPassion

# AlphaZero AlphaZero (2017) is the [[Google DeepMind]] system that generalized the [[AlphaGo]] Zero approach to play Go, chess, and shogi at superhuman level using a single algorithm, learning each game entirely from self-play with no prior human knowledge beyond the rules. ## What was new - **Game-agnostic architecture**: the same neural network and search algorithm, unchanged, learned three different games - **No human data**: trained purely via self-play reinforcement learning starting from random play - **Fast convergence**: in chess, AlphaZero reached superhuman strength in hours; defeated Stockfish (the strongest open-source chess engine at the time) in a 100-game match - **Qualitatively different play**: in chess especially, commentators described AlphaZero's style as more "positional" and sacrifice-prone than the calculation-heavy style of traditional engines ## Why it matters - Showed that a single self-play + deep-RL recipe generalizes across distinct games, not just Go - Established the "self-play bootstrap to superhuman" pattern later extended by [[MuZero]] to domains where the rules are unknown - Strengthened the case for RL + search as a route to strong narrow-AI systems, complementary to the later LLM wave ## References - Science paper (2018): https://www.science.org/doi/10.1126/science.aar6404 - Preprint (2017): https://arxiv.org/abs/1712.01815 - DeepMind blog: https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-chess-shogi-and-go/ ## Related - [[AlphaGo]] - [[MuZero]] - [[Google DeepMind]] - [[Demis Hassabis]] - [[Artificial Intelligence (AI)]]