# AI Scaling Laws
Empirical relationships between model performance and three variables: model size (parameters), dataset size (tokens), and compute budget (FLOPs). Performance follows predictable power laws, enabling prediction of capability before training.
Formalized by Kaplan et al. (2020) at OpenAI and refined by Hoffmann et al. (2022, "Chinchilla scaling laws").
Key Chinchilla finding: most models were under-trained relative to their size. Optimal allocation balances parameters and data roughly equally. A smaller model trained on more data can outperform a larger model trained on less. This shifted the industry from "bigger model = better" toward "right-sized model with enough data."
Implications: larger models are not always better if undertrained. Scaling laws guide how to allocate a fixed compute budget between model size and training data. They also explain why training data is becoming the bottleneck; the compute and parameter axes can be scaled with money, but high-quality data is finite.
## References
-
## Related
- [[Large Language Models (LLMs)]]
- [[Deep Learning]]