Dense Structural Priors for Sparse Functional Landmark Localization in Surgical Videos

arXiv:2606.31007v1 Announce Type: new Abstract: Vision foundation models such as SAM 3 can provide transferable object-level structure across diverse surgical video conditions, but segmentation outputs do not explicitly encode the action-conditioned semantics that define functional surgical landmarks. Estimating instrument extent and geometry differs from localizing the tip or anchor relevant to clipping, grasping, or dissecting. We investigate vision foundation model-enabled sparse action-aware...

arXiv cs.CV ·Chenyan Jing, Hao Ding, Lalithkumar Seenivasan, Jacob M. Delgado L\'opez, Mathias Unberath ·
compartilhar: