REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs
arXiv:2606.23892v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly used as perception-reasoning backbones for embodied intelligence in safety-critical physical systems, where perception or reasoning errors can lead to unsafe decisions or actions. Although many red-teaming methods have been developed to probe VLM vulnerabilities, their evaluation remains fragmented across datasets, metrics, and threat models, making direct comparison difficult and obscuring whether obs...
arXiv cs.CV
·Yifei Zhao, Qian Lou, Mengxin Zheng
·
// relacionados
Leia também
Blog
Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency
Blog
How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination
Blog
Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost
Blog