Blog
Multimodal
How Well Can Your Video Model Remember? Measuring Memory-Budget Trade-offs in Long Video Understanding
arXiv:2606.20726v1 Announce Type: new Abstract: We introduce a compact empirical model that quantifies how answer accuracy degrades as a function of frame budget B and temporal distance D in long video understanding -- analyzing performance when recalling content from D seconds in the past using a fraction B of total frames. Long-form models operate under strict budgets, yet no prior framework predicts how accuracy degrades as B shrinks and events recede. We fit a weighted least-squares model on...
arXiv cs.CV
·Yixian Tian
·
// relacionados