PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

PerceptionDLM enables efficient parallel region perception in multimodal diffusion language models through structured attention masking and efficient prompting, achieving faster in…

Hugging Face · Daily Papers ·Yueyi Sun, Yuhao Wang · ·▲ 50 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Yueyi Sun, Yuhao Wang, Jason Li, Ye Tian, Tao Zhang, Jacky Mai

  • 50 upvotes da comunidade
  • Temas: multimodal large language models, diffusion language models, parallel decoding, structured attention masking, region captioning, visual perception

Resumo

Resumo original (em inglês), extraído do paper:

PerceptionDLM enables efficient parallel region perception in multimodal diffusion language models through structured attention masking and efficient prompting, achieving faster inference without sacrificing caption quality.

Ler o paper completo no Hugging Face →

compartilhar: