Toward Low-Latency Vision-Language Models with Doubly-Correct Predictions in Egocentric Visual Understanding

arXiv:2606.25160v1 Announce Type: new Abstract: The rapid rise of Vision-Language Models (VLMs) in egocentric visual understanding has made low-latency inference in human-robot collaborative (HRC) tasks increasingly critical. Weight pruning techniques developed for VLMs to shrink model size and computation can be readily applied to satisfy the efficiency demands of on-board processing and real-time interactive robotics. Moreover, safe human-robot interaction demands pruning strategies that prese...

arXiv cs.RO ·Qitong Wang, Fan Du, Pranav Maneriker, Jihui Jin, Christopher Rasmussen ·
compartilhar: