Blog Multimodal Robótica & RL

Learning When to Listen: Gated Affect Fusion for Human Motion Prediction

arXiv:2607.00296v1 Announce Type: new Abstract: Human motion forecasting in unconstrained real-world videos remains challenging due to the ambiguity of future behaviors and the presence of noisy multimodal observations. While facial affect potentially provides complementary behavioral cues, its practical utility and mechanistic boundaries within motion forecasting frameworks remain poorly understood. In this work, we present a systematic study investigating the utility and temporal limitations o...

arXiv cs.CV ·Jingni Huang · 02 de janeiro de 2026

Ver no Hugging Face

// relacionados

Learning When to Listen: Gated Affect Fusion for Human Motion Prediction

Leia também

O gargalo que ninguém via: por que modelos multimodais "colam" durante o treino

Steal the Patch Size: Adversarially Manipulate Vision-Language Models

What's Hidden Matters: Identifying Planning-Critical Occluded Agents using Vision-Language Models

Wake up for Touch! Mask-isolated Tactile Alignment Learning in MLLMs