Video2Code: Generating Interactive Webpages from UI Videos via Action-Aware Revisit

arXiv:2606.20711v1 Announce Type: new Abstract: UI videos provide a natural input for generating interactive webpages, as they capture both webpage appearance and action-triggered state transitions. However, directly applying video-capable vision-language models to this task remains insufficient. Existing models typically rely on sparse sampling or compressed temporal representations, which may miss short action boundaries and break the state-action-state transitions needed to implement webpage ...

arXiv cs.CV ·Mingde Xu, Zhen Yang, Yan Wang, Yu Wang, Xijun Liu, Zijun Dou, Wenyi Hong, Xiaotao Gu, Bin Xu, Jie Tang ·
compartilhar: