Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

arXiv:2606.27687v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate, classify, and annotate data whose outputs feed downstream hypothesis tests. However, LLM-based research is easy to p-hack: a researcher can tune the prompts, decoding parameters, or output format until a desired result is reached. We propose a protocol to mitigate p-hacking in LLM-based research: preregistering the experiment and eligible models, and then running it on the first eligib...

arXiv cs.CL ·Maria Thomas, Kristina Gligoric, Nihar B. Shah ·
compartilhar: