Blog Robótica & RL LLMs & Texto

EmbodimentSemantic: A Spatial Scene-Graph Dataset and Benchmark for Vision-Language Models on Embodied Manipulation Trajectories

arXiv:2607.00020v1 Announce Type: new Abstract: Spatial grounding remains a key limitation of vision-language-action (VLA) systems for robotic manipulation. While current models can recognize objects and follow language instructions, they often lack an explicit representation of how objects are arranged in space, including support, containment, ordering, occlusion, and depth-sensitive relations. We introduce EmbodimentSemantic, a spatial scene-graph dataset and benchmark for evaluating relationa...

arXiv cs.RO ·Hassan Jaber, Refinath S N, Luca Cagliero, Christopher E. Mower, Haitham Bou-Ammar · 02 de janeiro de 2026

Ver no Hugging Face

// relacionados

EmbodimentSemantic: A Spatial Scene-Graph Dataset and Benchmark for Vision-Language Models on Embodied Manipulation Trajectories

Leia também

Um único exemplo basta: o truque de aritmética que reensina um robô

The Google Health API Got a CLI: ghealth is an Open-Source Tool for Your Fitbit Air Data

Optimal any-angle path planning in static and dynamic environments

Stop Pretending Social Robots Are Inevitable