Dense Structural Priors for Sparse Functional Landmark Localization in Surgical Videos
arXiv:2606.31007v1 Announce Type: new Abstract: Vision foundation models such as SAM 3 can provide transferable object-level structure across diverse surgical video conditions, but segmentation outputs do not explicitly encode the action-conditioned semantics that define functional surgical landmarks. Estimating instrument extent and geometry differs from localizing the tip or anchor relevant to clipping, grasping, or dissecting. We investigate vision foundation model-enabled sparse action-aware...
arXiv cs.CV
·Chenyan Jing, Hao Ding, Lalithkumar Seenivasan, Jacob M. Delgado L\'opez, Mathias Unberath
·
// relacionados
Leia também
Blog
Meta's non-invasive brain-to-text AI is closing the gap with surgical implants
Blog
LLMs are stuck in a groupthink groove. This startup is trying to get them out.
Blog
NVIDIA and Partners Build in America, for America
Blog