Radar de IA — Notícias, Modelos e Papers

REKEY: Metadata-Grounded Visual-Key Regeneration for Contamination-Resilient VQA Evaluation

arXiv:2606.20736v1 Announce Type: new Abstract: Static visual question answering (VQA) benchmarks age quickly: Once the items leak into training corpora, scores can reflect memorization rather than genuine visual ability, thus obscuring real progress. Rebuilding high-quality benchmarks such as V*Bench requires substantial human annotation, yet each static release can quickly become another leaked artifact. We propose ReKey, a live benchmark protocol that randomly regenerates the answer-bearing l...

23.06.2026

Blog LLMs & Texto

A Quantum-Assisted Agentic Distributed Artificial Intelligence Framework for Deadline-Bounded Orchestration of Hybrid Renewable Microgrids

arXiv:2606.20667v1 Announce Type: new Abstract: The real-time orchestration of microgrids that combine fluctuating renewable sources, dispatchable units, storage and curtailable consumers requires the repeated solution of combinatorial dispatch and coalition formation problems under hard control deadlines. In this paper, a quantum-assisted agentic distributed artificial intelligence (DAI) framework is proposed in which the dispatch problem of each control slot is formulated as a quadratic uncons...

23.06.2026

Blog Robótica & RL

Duet: Dual-Robot Understanding via Efficient Teaching

arXiv:2606.20990v1 Announce Type: new Abstract: Dual-robot collaboration enables tasks that exceed the reach and payload of a single robot, such as collaboratively transporting objects across environments and executing coordinated handovers. Data acquisition is the primary bottleneck for training these systems. To this end, we introduce DUET, a dual-robot learning framework for mobile manipulation. For efficient data collection, we create a unified dual-embodiment synchronized VR-based teleopera...

23.06.2026

Blog LLMs & Texto

SPARC: A Multi-Agent System for Electrical Circuit Question Answering

arXiv:2606.20643v1 Announce Type: new Abstract: Electrical circuit diagram QA tasks require complex mathematical reasoning, which remains challenging for multimodal LLMs. We present SPARC, a multi-agent system that answers questions over circuit diagrams by grounding reasoning in executable physics-based simulations. SPARC uses LLM agents to synthesize, execute, and analyze simulation programs, improving accuracy and reliability by design. It achieves 83% accuracy, with up to a 58% absolute impr...

23.06.2026

Blog LLMs & Texto

Path-dependent program induction under resource constraints explains human sequence learning

arXiv:2606.20623v1 Announce Type: new Abstract: How do people build abstract, reusable knowledge from sequential experience under bounded cognitive resources? To answer this question, we integrate rate-distortion theory with recent advances in program induction to describe how prior knowledge shapes which future structures are cheap to encode and easy to discover. We formalize this in a hierarchical Adaptor Grammar (HAG) with distinct local (within-task) and global (across-task) libraries, gover...

23.06.2026

Blog Multimodal

An approach with Visual and Tabular Mamba to multimodal medical data using Mixed Fusion

arXiv:2606.20738v1 Announce Type: new Abstract: This article presents a complementary approach for integrating multimodal medical data in cancer classification, based on state space models represented by the Mamba architecture. To this end, a mixed multimodal fusion architecture, called Mixed Fusion, was employed and developed to enhance the interpretability of the decision-making process. The proposed approach explores two variants of Mamba: one dedicated to visual processing, responsible for c...

23.06.2026

Blog Geração de Imagem

JPPD: Joint Prediction_Planning Diffusion with Differentiable Safety Guidance for Dynamic Obstacle Avoidance in Intelligent Transportation Systems

arXiv:2606.20686v1 Announce Type: new Abstract: Shared-space transportation operation requires low-speed autonomous platforms to navigate safely and efficiently among pedestrians, service robots, micromobility users, carts, and other road users. Most existing systems decompose this problem into trajectory prediction followed by motion planning, which creates one-way information flow: predicted participant futures influence the robot plan, but the selected robot plan cannot influence the predicte...

23.06.2026

Blog LLMs & Texto

Agent Behavior Mining: Generative AI Agent Governance in Business Processes

arXiv:2606.20669v1 Announce Type: new Abstract: As organizations increasingly deploy generative AI agents to automate business processes, they face a governance dilemma: although these agents can increase operational flexibility, their non-deterministic nature challenges the control and standardization that Business Process Management seeks to enforce. This paper addresses this \emph{invisible autonomy risk} by introducing \emph{Agent Behavior Mining}, a governance capability that enables the ap...

23.06.2026

Blog Dados & Embeddings

Detecting Satellites in Radio-Frequency Data via Semi-Supervised Learning

arXiv:2606.20976v1 Announce Type: new Abstract: Radio-frequency (RF) monitoring is essential for space domain awareness, but it often generates large, variable, and sparsely populated datasets with few labels. These observations can capture satellites, space debris, and the ionospheric background, yet interpreting them typically requires specialized subject-matter expertise. Supervised deep learning methods can perform well on labeled RF data, but they require many annotated examples and may nee...

23.06.2026

Blog LLMs & Texto

DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency

arXiv:2606.20634v1 Announce Type: new Abstract: Agent-runtime systems emit traces, ledgers, provenance graphs, policy logs, delegation tokens, cache events, and tool-firewall records, but those containers do not necessarily answer governance questions about a specific decision. DEMM-Bench is a cross-regime benchmark for agent-runtime governance-evidence sufficiency, grounded in the Decision Evidence Maturity Model (DEMM): it measures whether records across eight evidence regimes are sufficient t...

23.06.2026

Blog LLMs & Texto

CELEUS: Certifiable and Efficient LLM Evaluation via E-Processes

arXiv:2606.20820v1 Announce Type: new Abstract: Can we trust evaluation scores to capture an LLM's true real-world performance? Certifiable evaluation answers this question by providing guarantee for LLM evaluation. In particular, existing methods sequentially curate evaluation samples and keep updating confidence intervals (CIs) that cover the true performance with high probability (e.g., 95%) until some conditions are satisfied, e.g., the CI width reaches a target precision. However, existing ...

23.06.2026

Blog LLMs & Texto

VeriBound: PAC-Bayesian Generalization Bounds for Process Reward Models Trained with Formal Verification Tools

arXiv:2606.20740v1 Announce Type: new Abstract: Process Reward Models (PRMs) provide step-level verification for Large Language Model (LLM) reasoning, yet their training data acquisition remains a bottleneck: human annotation is costly and Monte Carlo roll-out estimates are noisy. A recent approach, FOVER, trains PRMs on step-level error labels automatically annotated by formal verification tools such as Z3 and Isabelle, and empirically observes cross-task generalization from symbolic tasks to d...

23.06.2026

O que está acontecendo agora

REKEY: Metadata-Grounded Visual-Key Regeneration for Contamination-Resilient VQA Evaluation

A Quantum-Assisted Agentic Distributed Artificial Intelligence Framework for Deadline-Bounded Orchestration of Hybrid Renewable Microgrids

Duet: Dual-Robot Understanding via Efficient Teaching

SPARC: A Multi-Agent System for Electrical Circuit Question Answering

Path-dependent program induction under resource constraints explains human sequence learning

An approach with Visual and Tabular Mamba to multimodal medical data using Mixed Fusion

JPPD: Joint Prediction_Planning Diffusion with Differentiable Safety Guidance for Dynamic Obstacle Avoidance in Intelligent Transportation Systems

Agent Behavior Mining: Generative AI Agent Governance in Business Processes

Detecting Satellites in Radio-Frequency Data via Semi-Supervised Learning

DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency

CELEUS: Certifiable and Efficient LLM Evaluation via E-Processes

VeriBound: PAC-Bayesian Generalization Bounds for Process Reward Models Trained with Formal Verification Tools