UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating
UnityShots is a memory-driven audio-video generation system that maintains consistent subject appearance and audio across video cuts using fixed-size long-term and short-term memor…
Hugging Face · Daily Papers
·Jiehui Huang, Yuechen Zhang
·
·▲ 14 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Jiehui Huang, Yuechen Zhang, Bin Xia, Jiahao Wang, Xu He, Zhenchao Tang
- 14 upvotes da comunidade
- Temas: multi-shot audio-video generation, LTX-2.3, long-term memory, short-term memory, boundary-conditioned gate, visual cut probability
Resumo
Resumo original (em inglês), extraído do paper:
UnityShots is a memory-driven audio-video generation system that maintains consistent subject appearance and audio across video cuts using fixed-size long-term and short-term memory slots with boundary-conditioned gates and discrete cut-type priors.
// relacionados
Leia também
Editorial
LTX-2: o primeiro modelo fundacional de vídeo e áudio em conjunto — aberto, com 19B de parâmetros
Blog
How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring
Blog
Graph-Based Phonetic Error Correction of Noisy ASR
Blog