Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

Vision-language models struggle to distinguish between shared and interpreted visual information in dialogue, relying on static map cues rather than dynamic grounding processes.

Hugging Face · Daily Papers ·Nan Li, Albert Gatt · ·▲ 4 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Nan Li, Albert Gatt, Massimo Poesio

  • 4 upvotes da comunidade
  • Temas: vision-language models, dialogue context, map-information access, interpretation-matching task, reference expressions, grounding

Resumo

Resumo original (em inglês), extraído do paper:

Vision-language models struggle to distinguish between shared and interpreted visual information in dialogue, relying on static map cues rather than dynamic grounding processes.

Onde ler

compartilhar: