Radar de IA — Notícias, Modelos e Papers

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

arXiv:2606.20717v1 Announce Type: new Abstract: Multimodal Large Language Model (MLLM)-based web agents provide practical, high-precision solutions for visual browser automation; however, they inherently expand the attack surface, introducing novel vision-based vulnerabilities. Existing adversarial evaluations targeting these agents frequently rely on permissive threat models and visually conspicuous artifacts. In this paper, we investigate a constrained vulnerability detection setting: a truste...

23.06.2026

Blog LLMs & Texto

Physics-Guided Fully Convolutional Spatiotemporal Learning Toward Digital-Twin-Enabled Microstructure Evolution Prediction

arXiv:2606.20983v1 Announce Type: new Abstract: Understanding and predicting microstructure evolution is central to materials design, yet purely data-driven spatiotemporal learning models often suffer from limited physical consistency and degraded long-term prediction accuracy. In this work, we introduce a physics-guided fully convolutional spatiotemporal learning framework for microstructure evolution prediction. Unlike prior self-supervised approaches, the proposed method explicitly incorporat...

23.06.2026

Blog LLMs & Texto

The Metanym Game: A Self-Contained, Self-Consistent LLM Peer-Community Benchmark for Structural Intelligence

arXiv:2606.21008v1 Announce Type: new Abstract: The metanym game is a competitive word game for LLMs that measures structural intelligence against established cognitive-science constructs. No content is given in advance; the contestants create all of it -- a new kind of analogy test, analogical production falsifiable sentence by sentence, with no fixed test set to leak into training (contamination-resistant by construction). In the council-of-peers benchmark, the contestants also rate each other...

23.06.2026

Blog LLMs & Texto

GRAG: Generic Response-Augmented Generation Framework for Personalized Conversational Systems

arXiv:2606.21097v1 Announce Type: new Abstract: Deploying highly capable personalized conversational agents in resource-constrained or privacy-sensitive environments remains a significant challenge. We identify a fundamental bottleneck in the existing approaches: current training paradigms treat personalization and grounding as a single monolithic learning problem. Under these paradigms, language models are forced to simultaneously address what to say (content grounding) and how to say it in a u...

23.06.2026

Blog Robótica & RL

Perturbation-Based Uncertainty for Failure Detection in Vision-Language-Action Models

arXiv:2606.20754v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have shown strong performance in robotic manipulation, but reliable uncertainty quantification remains challenging, particularly under distribution shift. Unlike autoregressive policies, many modern VLA models generate continuous actions through regression or flow-based generation, where explicit predictive probabilities are unavailable. Moreover, existing approaches often rely on stochastic action sampling or su...

23.06.2026

Blog Robótica & RL

Real-World Deployment of Massively Parallel Sampling-Based MPC for Contact-Rich Manipulation

arXiv:2606.20712v1 Announce Type: new Abstract: Sampling-based Model Predictive Control (SMPC) is a promising strategy for contact-rich robotic manipulation, combining gradient-free optimization with massively parallel GPU simulation. Yet, most prior work relies on simplified dynamics or remains confined to simulation. We present an MPC framework that leverages JAX for large-scale parallelization and efficient computation, coupled with the high-fidelity MuJoCo MJX simulator, and deploy it on a F...

23.06.2026

Blog LLMs & Texto

Short-Term Electricity Demand Forecasting for New England Using a Hybrid Transformer-XGBoost Framework with Weather, Calendar, and COVID-19 Indicators

arXiv:2606.20918v1 Announce Type: new Abstract: Accurate short-term electricity demand forecasting is critical for reliable power system operation, energy market planning, and infrastructure optimization. This paper presents a hybrid framework combining a Transformer encoder for temporal feature extraction with gradient-boosted decision trees (XGBoost) for daily electricity demand forecasting across New England. The framework integrates meteorological observations from six cities spanning all si...

23.06.2026

Blog LLMs & Texto

Beyond Fixed Budgets: Characterizing the Inelasticity and Limitations of Tree-of-Thought Reasoning Strategies

arXiv:2606.20599v1 Announce Type: new Abstract: Tree of Thought (ToT) search has become a promising direction for improving the reasoning capabilities of large language models, but deploying these methods in practice raises a question that has received little systematic attention: how do different search strategies behave under varying compute budgets, model sizes, and problem difficulties? In this work, we evaluate two representative ToT methods; DPTS, a Monte Carlo tree search based approach, ...

23.06.2026

Blog Robótica & RL

R2HandoverSim: A Simulation Framework and Benchmark for Robot-to-Human Object Handovers

arXiv:2606.21011v1 Announce Type: new Abstract: We present R2HandoverSim, a simulation benchmark for robot-to-human (R2H) object handovers. Although R2H handover methods have advanced rapidly, the lack of standardized evaluation protocols impedes objective comparison. Our benchmark enables reproducible evaluation by systematically comparing four baselines on their predicted shared grasp poses. We conduct a user study with 30 participants, analyze baseline performance, and show that simulation re...

23.06.2026

Blog LLMs & Texto

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

arXiv:2606.20636v1 Announce Type: new Abstract: Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challenge by learning reusable skills from successful trajectories. However, these skill learning methods largely assume static and safe environments, overlooking risks from adversarial interactions (e.g., prompt injections) and environmental dynamics (e.g.,...

23.06.2026

Blog LLMs & Texto

A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation

arXiv:2606.20955v1 Announce Type: new Abstract: Dynamic average estimation is a critical problem in multi-agent systems, enabling agents to collaboratively estimate time-varying signals using only local information exchange. Traditional model-based approaches often face challenges related to convergence speed and sensitivity to network topology changes. This paper introduces a novel learning-based solution leveraging Gated Graph Neural Networks (GGNNs) for fast-convergent dynamic average estimat...

23.06.2026

Blog LLMs & Texto

Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory

arXiv:2606.20642v1 Announce Type: new Abstract: Asymptotic statistical theory is a challenging domain for AI-assisted formalization: its central results mix convergence statements, asymptotic expansions, functional analysis, and regularity conditions that have a large gap from existing infrastructure in Lean 4 formalization. To address these challenges, we propose a hypothesis-disciplined Lean 4 formalization pipeline built from multiple agents: a manager that coordinates seven specialist roles ...

23.06.2026

O que está acontecendo agora

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

Physics-Guided Fully Convolutional Spatiotemporal Learning Toward Digital-Twin-Enabled Microstructure Evolution Prediction

The Metanym Game: A Self-Contained, Self-Consistent LLM Peer-Community Benchmark for Structural Intelligence

GRAG: Generic Response-Augmented Generation Framework for Personalized Conversational Systems

Perturbation-Based Uncertainty for Failure Detection in Vision-Language-Action Models

Real-World Deployment of Massively Parallel Sampling-Based MPC for Contact-Rich Manipulation

Short-Term Electricity Demand Forecasting for New England Using a Hybrid Transformer-XGBoost Framework with Weather, Calendar, and COVID-19 Indicators

Beyond Fixed Budgets: Characterizing the Inelasticity and Limitations of Tree-of-Thought Reasoning Strategies

R2HandoverSim: A Simulation Framework and Benchmark for Robot-to-Human Object Handovers

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation

Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory