ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering
A progressive multimodal search agent for knowledge-based visual question answering that adaptively selects search strategies and optimizes through sequence-level reinforcement lea…
Hugging Face · Daily Papers
·ZhengXian Wu, Hangrui Xu
·
·▲ 7 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: ZhengXian Wu, Hangrui Xu, Kai Shi, Zhuohong Chen, Yunyao Yu, Chuanrui Zhang
- 7 upvotes da comunidade
- Temas: Knowledge-based Visual Question Answering, multimodal search agent, retrieve-then-generate pipeline, tool-call budgets, deduplication, rejection-sampling SFT
Resumo
Resumo original (em inglês), extraído do paper:
A progressive multimodal search agent for knowledge-based visual question answering that adaptively selects search strategies and optimizes through sequence-level reinforcement learning.Onde ler
// relacionados
Leia também
Blog
The US military used AI to pick thousands of targets but missed a note saying one was a school
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Editorial
O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam
Editorial