Learning When to Listen: Gated Affect Fusion for Human Motion Prediction
arXiv:2607.00296v1 Announce Type: new Abstract: Human motion forecasting in unconstrained real-world videos remains challenging due to the ambiguity of future behaviors and the presence of noisy multimodal observations. While facial affect potentially provides complementary behavioral cues, its practical utility and mechanistic boundaries within motion forecasting frameworks remain poorly understood. In this work, we present a systematic study investigating the utility and temporal limitations o...
arXiv cs.CV
·Jingni Huang
·
// relacionados
Leia também
Editorial
O gargalo que ninguém via: por que modelos multimodais "colam" durante o treino
Blog
Steal the Patch Size: Adversarially Manipulate Vision-Language Models
Blog
What's Hidden Matters: Identifying Planning-Critical Occluded Agents using Vision-Language Models
Blog