AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation
arXiv:2606.30811v1 Announce Type: new Abstract: Audio-video generation has recently gained unprecedented research attention, aiming to synthesize high-quality sounding video content with fine-grained synchronization and semantic alignment between the auditory and visual components. The preceding methods predominantly adopt a dual-branch design with separate tokenization and generation modules per modality, neglecting the representation gap while necessitating intensive computational resources fo...
arXiv cs.CV
·Kien T. Pham, I Chieh Chen, Qifeng Chen, Long Chen
·
// relacionados
Leia também
Blog
SpaceX has an AI device prototype, and it sure sounds phone-ish
Blog
Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller
Blog
Building a Multimodal Dataset of Academic Paper for Keyword Extraction
Blog