// radar de ia

LLMs & Texto

Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.

Blog LLMs & Texto

Jury Duty: Calibration and Orientation Failures in MLLM-as-a-Judge Under Cultural Ambiguity

arXiv:2606.20676v1 Announce Type: new Abstract: MLLM-as-a-Judge is conventionally validated by agreement with human annotations, but this metric is undefined when the human pool is culturally heterogeneous. We introduce VOIR DIRE, a multimodal benchmark of 626 culturally paired image--prompt artifacts spanning U.S. and mainland Chinese contexts across food, fashion, and architecture, with annotator pools that are within-pool reliable (a = 0.86/0.74) but cross-pool divergent on evaluation (Q1 r =...

23.06.2026
Blog LLMs & Texto

Constituency Optimisation Through Hamiltonian Representation Of Mandates (COTHROM): Algorithmic Redistricting of Irish Election Boundaries

arXiv:2606.20637v1 Announce Type: new Abstract: Electoral redistricting in Ireland's Proportional Representation Single Transferable Vote (PR-STV) system faces the challenge of selecting an optimally representative set of electoral boundaries from an enormous set of possible configurations, and where ``representative'' is a delicate balance of constitutional objectives that are often in tension with one another. We present the first computational framework for Irish electoral redistricting that ...

23.06.2026
Blog LLMs & Texto

Is Our Benchmark Enough? An Analysis of Continual Learning for MLLMs

arXiv:2606.20961v1 Announce Type: new Abstract: Continual adaptation is essential for multimodal large language models (MLLMs) deployed across evolving domains, but the state-of-the-art MR-LoRA method highly relies on the assumption that a MLLM-based router is necessary to process complex multimodal inputs. This paper revisits this claim on the MLLM-CL benchmark and argues for two claims. \textbf{First}, routing does not require an MLLM: a simple training-free, replay-free ptotypical routing met...

23.06.2026
Blog LLMs & Texto

SciLens: Multi-modal Scientific Claim Verification with Agentic Entailment and Grounding

arXiv:2606.20873v1 Announce Type: new Abstract: Scientific discovery increasingly relies on automated systems that generate hypotheses, inspect multimodal evidence, and validate claims at scale. Yet scientific claim verification is not well served by asking a vision-language model for a direct binary judgment: claims often combine numerical results, comparisons, scope qualifiers, and explanatory context, while evidence is encoded in tables and figures with distinct grounding structures. We prese...

23.06.2026
Blog LLMs & Texto

A Validation-Gated Mechanistic Account of Suicidality Detection in LLMs

arXiv:2606.21078v1 Announce Type: new Abstract: Large language models are increasingly proposed for mental-health applications such as detecting suicidal content, raising the question of what they rely on. We study this mechanistically and use it to ask a narrower question: how to make a causal claim about a model's internal features more trustworthy. Our validation-gated framework, with suicidality detection as a case study, interprets a behavior only after the model is shown to perform it: a c...

23.06.2026
Blog LLMs & Texto

A Projection-Based Surrogate Gradient Interpretation for Neural Codec Wrappers

arXiv:2606.20671v1 Announce Type: new Abstract: Neural wrappers are learned pre-and postprocessing networks designed to enhance the performance of conventional video codecs. Although these approaches can significantly improve compression efficiency, training them remains challenging due to the non-differentiability of video codecs, which arises from the multiple discrete decisions involved in the encoding process. Surrogate gradients have recently emerged as an effective solution for enabling en...

23.06.2026
Blog LLMs & Texto

An LLM-Explainable DRL Framework for Passenger-Directed Autonomous Driving

arXiv:2606.20640v1 Announce Type: new Abstract: Autonomous vehicles offer the potential for safer and more efficient mobility, yet public trust remains limited due to the lack of transparency in their decision-making. This work addresses this issue by combining deep reinforcement learning (DRL) for adaptive driving control with large language model (LLM)-based explainability modules designed to communicate agent behavior to passengers. DRL agents were trained in simulation using a Dueling Double...

23.06.2026
Blog LLMs & Texto

Less is More: Lightweight Prompt Compression for Question Answering Applications on Edge Devices

arXiv:2606.20571v1 Announce Type: new Abstract: In agent-driven question answering (QA) applications, retrieval-augmented generation (RAG) is commonly introduced to enhance the response accuracy of large language models (LLMs) by providing additional context. Due to the inherent noise in retrieval results and the coarse granularity of document-level retrieval, the retrieved context often contains substantial redundant information. In this setting, the agent prompt, consisting of the user query a...

23.06.2026
Blog LLMs & Texto

Quality and Agreement in Multilabel Emotion Annotation: A Case Study and Evaluation Framework

arXiv:2606.21069v1 Announce Type: new Abstract: Emotion annotation is inherently subjective, yet most NLP pipelines still assume "gold" labels, typically produced by majority voting, and treat annotator variation as noise. In this paper, we present a multilabel emotion annotation case study and use it to examine how annotator behavior and aggregation choices affect both agreement estimates and downstream emotion classifiers. Rather than collapsing disagreement into a single label, we represent t...

23.06.2026
Blog LLMs & Texto

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

arXiv:2606.20717v1 Announce Type: new Abstract: Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive threat models and visually conspicuous artifacts. In this paper, we investigate a constrained vulnerability detection setting: a truste...

23.06.2026
Blog LLMs & Texto

Physics-Guided Fully Convolutional Spatiotemporal Learning Toward Digital-Twin-Enabled Microstructure Evolution Prediction

arXiv:2606.20983v1 Announce Type: new Abstract: Understanding and predicting microstructure evolution is central to materials design, yet purely data-driven spatiotemporal learning models often suffer from limited physical consistency and degraded long-term prediction accuracy. In this work, we introduce a physics-guided fully convolutional spatiotemporal learning framework for microstructure evolution prediction. Unlike prior self-supervised approaches, the proposed method explicitly incorporat...

23.06.2026
490 itens no radar