Blog Multimodal LLMs & Texto

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

arXiv:2606.28938v1 Announce Type: new Abstract: Modern vision-language models (VLMs) for driving assistants typically treat vehicle dynamics as a black box, resulting in decisions that lack awareness of the vehicle's real-time electro-mechanical state. To bridge this gap, we introduce the Electro-Visual-Language Assistant (EVLA) -- a novel framework that combines multi-modal scene understanding with real-time perception of the electrified powertrain state (e.g., motor torque, battery SOC). Our a...

arXiv cs.CL ·Yuxin Liu, Zihan Chen, Haoyu Wang, Mingxuan Zhang, Ruijie Lin, Siyuan Zhao · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

Leia também

LocateAnything-3B: a NVIDIA ensina um modelo a apontar o dedo na imagem

InternScience/Agents-A1

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models