Blog LLMs & Texto Visão Computacional

Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners

arXiv:2607.00125v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable abilities when analyzing images, yet translating these capabilities to few-shot image classification remains challenging. To bridge this gap, we present DeCoDe, a simple yet effective technique that enables off-the-shelf MLLMs to act as strong few-shot classifiers without any additional training. Our approach builds on the idea of few-shot classification as a set of pairwise imag...

arXiv cs.CV ·Yunhan Wang, Eshika Khandelwal, Edson Araujo, Walid Bousselham, Nina Shvetsova, Hilde Kuehne · 02 de janeiro de 2026

Ver no Hugging Face

// relacionados

Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners

Leia também

Claude Sonnet 5: a Anthropic aposta que o modelo do meio faz o trabalho do topo

Google’s AI buildout drove 37% increase in electricity use in 2025

OpenAI reportedly offers the Trump administration a five percent stake in the company

The Google Health API Got a CLI: ghealth is an Open-Source Tool for Your Fitbit Air Data