Blog Visão Computacional LLMs & Texto

Does Your ViT Still Need U-Net for Segmentation?

arXiv:2607.00223v1 Announce Type: new Abstract: Medical image segmentation is dominated by U-Net-style encoder-decoder architectures. Vision Transformers (ViTs) overcome the limited receptive field of convolutional networks through self-attention, enabling modeling of long-range dependencies. Early ViT-based segmentation methods typically retained U-Net-style decoders because pretrained ViT representations were insufficient to support accurate dense prediction. Recent advances in large-scale pre...

arXiv cs.CV ·Xin Li, Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Yanxi Chen, Yujian Xiong, Hao Wang, Oana M. Dumitrascu, Yalin Wang · 02 de janeiro de 2026

Ver no Hugging Face

// relacionados

Does Your ViT Still Need U-Net for Segmentation?

Leia também

Stop Pretending Social Robots Are Inevitable

Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing

Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners

Joint Medical Image Enhancement and Segmentation with Diffusion-based Symbiotic Information Interaction