VisChronos: Revolutionizing Image Captioning Through Real-Life Events

arXiv:2606.24058v1 Announce Type: new Abstract: This paper aims to bridge the semantic gap between visual content and natural language understanding by leveraging historical events in the real world as a source of knowledge for caption generation. We propose VisChronos, a novel framework that utilizes large language models and dense captioning models to identify and describe real-life events from a single input image. Our framework can automatically generate detailed and context-aware event desc...

arXiv cs.CV ·Phuc-Tan Nguyen, Hieu Nguyen, Minh-Triet Tran, Trung-Nghia Le ·
compartilhar: