Blog Dados & Embeddings LLMs & Texto

REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

arXiv:2606.23892v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly used as perception-reasoning backbones for embodied intelligence in safety-critical physical systems, where perception or reasoning errors can lead to unsafe decisions or actions. Although many red-teaming methods have been developed to probe VLM vulnerabilities, their evaluation remains fragmented across datasets, metrics, and threat models, making direct comparison difficult and obscuring whether obs...

arXiv cs.CV ·Yifei Zhao, Qian Lou, Mengxin Zheng · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

Leia também

Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency

How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination

Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost

Talos: Scaling rare disease diagnosis with automated, iterative genomic reanalysis