Video2Code: Generating Interactive Webpages from UI Videos via Action-Aware Revisit
arXiv:2606.20711v1 Announce Type: new Abstract: UI videos provide a natural input for generating interactive webpages, as they capture both webpage appearance and action-triggered state transitions. However, directly applying video-capable vision-language models to this task remains insufficient. Existing models typically rely on sparse sampling or compressed temporal representations, which may miss short action boundaries and break the state-action-state transitions needed to implement webpage ...