Blog Multimodal LLMs & Texto

The Professor: Multi-Teacher Unsupervised Prompt Distillation for Vision-Language Models

arXiv:2606.23897v1 Announce Type: new Abstract: Prompt distillation compresses large vision-language models (VLMs) such as CLIP into lightweight student models by matching teacher predictions on unlabeled domain images. PromptKD (CVPR 2024) established this paradigm with a single PromptSRC-finetuned ViT-L/14 teacher and a ViT-B/16 student. We propose TheProfessor, a multi-teacher extension that distills from a fixed two-teacher ensemble: a domain-finetuned PromptSRC ViT-L/14 teacher and a zero-s...

arXiv cs.CV ·Ahmad Algadhi, Ahmed Alzuhair, Omar Alkhulaif, Muzammil Behzad · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

The Professor: Multi-Teacher Unsupervised Prompt Distillation for Vision-Language Models

Leia também

Cosmos 3: o primeiro modelo aberto que vê, simula e age no mundo físico

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

VisChronos: Revolutionizing Image Captioning Through Real-Life Events