Paper LLMs & Texto Multimodal

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

Vision-language models struggle to distinguish between shared and interpreted visual information in dialogue, relying on static map cues rather than dynamic grounding processes.

Hugging Face · Daily Papers ·Nan Li, Albert Gatt · 30 de janeiro de 2026 ·▲ 4 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Nan Li, Albert Gatt, Massimo Poesio

4 upvotes da comunidade
Temas: vision-language models, dialogue context, map-information access, interpretation-matching task, reference expressions, grounding

Resumo

Resumo original (em inglês), extraído do paper:

Vision-language models struggle to distinguish between shared and interpreted visual information in dialogue, relying on static map cues rather than dynamic grounding processes.

Onde ler

Ver no Hugging Face

// relacionados

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

Resumo

Onde ler

Leia também

O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico

AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer

ByteDance-Seed/EdgeBench

Google DeepMind e A24 anunciam parceria de pesquisa inédita