Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

A unified framework named Perceive-to-Reason (P2R) is introduced that separates visual perception from reasoning in vision-language models through a two-stage process, improving fi…

Hugging Face · Daily Papers ·Hongxing Li, Xiufeng Huang · ·▲ 10 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Hongxing Li, Xiufeng Huang, Dingming Li, Wenjing Jiang, Zixuan Wang, Haolei Xu

  • 10 upvotes da comunidade
  • Temas: vision-language models, fine-grained visual reasoning, Perceiver, Reasoner, Perception-Reasoning Alternating GRPO, reinforcement learning

Resumo

Resumo original (em inglês), extraído do paper:

A unified framework named Perceive-to-Reason (P2R) is introduced that separates visual perception from reasoning in vision-language models through a two-stage process, improving fine-grained visual reasoning performance on high-resolution images.

Onde ler

compartilhar: