Blog Visão Computacional

Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing

arXiv:2607.00124v1 Announce Type: new Abstract: Object-centric models inspired by DETR have become the dominant paradigm for open-vocabulary video instance segmentation (OV-VIS). While recent efforts have reduced the computational cost of pixel decoding, textual modality fusion, and object decoding to make these architectures more suitable for mobile devices, real-time on-device inference at high frame rates remains an open challenge. In this paper, we introduce SegFS, a dual-stream fast-slow fr...

arXiv cs.CV ·Luca Barsellotti, Martin Sundermeyer, Mattia Segu, Nikita Araslanov, Muhammad Ferjad Naeem, Marcella Cornia, Yongqin Xian, Maxim Berman · 02 de janeiro de 2026

Ver no Hugging Face

// relacionados

Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing

Leia também

Stop Pretending Social Robots Are Inevitable

Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners

Joint Medical Image Enhancement and Segmentation with Diffusion-based Symbiotic Information Interaction

Representation as a Bottleneck for Mechanistic Interpretability: The Manifestation Unit Protocol