Blog
Visão Computacional
VTOS: Learning to Orchestrate Vision Tools by Co-Searching Solutions and Observers
arXiv:2606.20728v1 Announce Type: new Abstract: Vision foundation tools such as open-vocabulary detectors, segmentation models, and post-processing operators are powerful building blocks for computer vision, but their effectiveness depends heavily on how they are orchestrated: which tools are used, in what order, with what parameters, and under what visual conditions. Existing visual-programming agents typically generate a fixed solution pipeline, making them brittle under dense objects, occlusi...
23.06.2026