One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding
InnerZoom addresses GUI grounding challenges by preserving target-region awareness across decoder layers through a single-forward pass that bridges cross-layer evidence, achieving…
Hugging Face · Daily Papers
·Chen Liu, Ling Chen
·
·▲ 3 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Chen Liu, Ling Chen, Hanzhang Zhou, Liangyu Chen, Chenglin Cai, Xin Yu
- 3 upvotes da comunidade
- Temas: MLLM-based GUI grounding, autoregressive coordinate generation, spatial precision, decoder layers, cross-layer evidence bridging, ZoomIn-style methods
Resumo
Resumo original (em inglês), extraído do paper:
InnerZoom addresses GUI grounding challenges by preserving target-region awareness across decoder layers through a single-forward pass that bridges cross-layer evidence, achieving state-of-the-art performance with reduced computational cost.Onde ler
// relacionados
Leia também
Modelo
nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
Blog
OpenClaw is finally available on Android and iOS
Blog
Claude Science is Anthropic’s newest flagship product
Blog