Blog LLMs & Texto

How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

arXiv:2607.01487v1 Announce Type: new Abstract: We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training runs, we find that it correctly recovers the scaling of the optimal batch size. Moreover, because it makes use of training runs with suboptimal batch size, our proposed law can be robustly fit with a significantly smaller am...

arXiv cs.LG ·Fabian Schaipp · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

Leia também

O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico

AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer

ByteDance-Seed/EdgeBench

Google DeepMind e A24 anunciam parceria de pesquisa inédita