Paper LLMs & Texto Multimodal

Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City

Research examines how self-driving car systems and humans perform on visual question answering tasks across different geographic locations, revealing that both human and AI respons…

Hugging Face · Daily Papers ·Adrian Cespedes, Marcelo Chincha · 18 de janeiro de 2026 ·▲ 1 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Adrian Cespedes, Marcelo Chincha, Dunant Cusipuma, Victor Flores-Benites, David Ortega, Arturo Deza

1 upvotes da comunidade
Temas: Visual Question Answering, VLMs, out-of-distribution, multi-modal systems, Self-Driving Cars, dashcam footage

Resumo

Resumo original (em inglês), extraído do paper:

Research examines how self-driving car systems and humans perform on visual question answering tasks across different geographic locations, revealing that both human and AI responses diverge based on question types but show similar performance regardless of location.

Ler o paper completo no Hugging Face →

Ver no Hugging Face

// relacionados

Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City

Resumo

Leia também

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python

How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery

Omio scales travel product development using OpenAI models