Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge
arXiv:2606.27527v1 Announce Type: new Abstract: Large Language Models (LLMs) possess broad conceptual knowledge acquired through large-scale text pretraining, yet their potential to supervise models in other modalities remains underexplored. In this work, we propose LaViD--Language-to-Visual Knowledge Distillation--a simple and effective framework for transferring high-level semantic knowledge from a language-only teacher to a vision-only student model. Instead of relying on paired multimodal da...
arXiv cs.CV
·Thomas Shih-Chao Liang, Zhuoran Yu, Yong Jae Lee
·
// relacionados
Leia também
Blog
The US military used AI to pick thousands of targets but missed a note saying one was a school
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Editorial
O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam
Editorial