Parameter Golf: What Really Works?

arXiv:2607.01517v1 Announce Type: new Abstract: How far can a language model improve under a strict artifact budget? Parameter Golf posed this question as an open community challenge in which participants trained the best language model, with the complete artifact (training code + compressed weights) required to fit within 16 MB and be trained in under ten minutes on 8xH100 SXM GPUs. Quality was measured in bits-per-byte (BPB), the average number of bits required to encode each byte of unseen te...

arXiv cs.CL ·Prashanna Mani Paudel, Shivanand Venkanna Sheshappanavar ·
compartilhar: