Blog Robótica & RL LLMs & Texto

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

arXiv:2606.27755v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models enable instruction-driven robotic manipulation, but they inherit oversized language backbones from pretrained VLMs whose capacity far exceeds what is needed for short robotic instructions. This raises a basic question: how much of a VLA model is actually necessary for closed-loop control? In this work, we study architectural redundancy in VLA models by using transformer block removal as a controlled intervention....

arXiv cs.RO ·Guoheng Sun, Kaixi Feng, Shwai He, Xiaochuan Gong, Yexiao He, Ziyao Wang, Zheyu Shen, Wanghao Ye, Ramana Rao Kompella, Gaowen Liu, Ang Li · 29 de janeiro de 2026

Ver no Hugging Face

// relacionados

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

Leia também

HP accelerates enterprise workflows with OpenAI Frontier

Open Models, Closed Environments: Palantir Brings Secure AI to US Agencies With NVIDIA Nemotron

Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control

Wimbledon adds IBM AI tools for live match coverage