Blog Robótica & RL Multimodal

The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

arXiv:2607.02322v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown remarkable promise in generalized robotic manipulation. However, their spatial generalization remains fragile. We argue that simply increasing the number of viewpoints is insufficient. Models often fall into the trap of Shortcut Learning, latching onto spurious correlations (e.g., fixed relative poses between objects or between the camera and robot base) rather than learning true spatial relationships....

arXiv cs.RO ·Jincheng Tang, Yilong Zhu, Zhengyuan Xie, Jiang-Jiang Liu, Jiaxing Zhang · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

The Moving Eye: Enhancing VLA Spatial Generalization via Hybrid Dynamic Data Collection

Leia também

UWORLD U1: a UBTECH lança o primeiro humanoide "ultra-biônico" em série — e a dança que expôs os limites

Takeda fecha acordo de US$ 600 milhões com a Insilico para descoberta de medicamentos com IA

Conheça o WebBrain: um agente de navegador com IA de código aberto e local-first que lê páginas e automatiza tarefas no Chrome e no Firefox

CoRe: Recompensas Combinadas com Feedback de Modelo de Visão-Linguagem para Aprendizado por Reforço Alinhado a Preferências