Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

arXiv:2606.27909v1 Announce Type: new Abstract: Theory-of-mind evaluations of large language models typically use dyadic social-deduction games, where every observable cue points to a single hidden side, so a model with strong language priors can score well without ever simulating opponents' incentives. We extend the Werewolf game with a Jester, a third faction whose utility on peer suspicion is inverted because it wins by being voted out, so optimal play requires reasoning across three opposing...

arXiv cs.CL ·Avni Mittal ·
compartilhar: