LocalNav: Distilling Frontier VLMs and Embodied RL for On-Device Object Goal Navigation

arXiv:2606.27871v1 Announce Type: new Abstract: Vision Language Models (VLMs) have emerged in the robotic domain as a powerful tool that enables environmental perception with language context, serving as a catalyst for open-vocabulary tasks like ObjectNav. Yet, their computational footprint typically confines them to cloud execution, hindering low-latency inference with local deployment on resource-constrained robots. To address this challenge, we present a distillation strategy that transfers c...

arXiv cs.RO ·Nicolas Baumann, Liam Boyle, Pu Deng, Edoardo Ghignone, Boyang Sun, Marc Pollefeys, Luca Benini, Michele Magno ·
compartilhar: