Visão Computacional — Radar de IA

LocateAnything-3B: o modelo da NVIDIA que aponta o dedo

Descreva qualquer coisa numa imagem em linguagem natural e ele desenha a caixa em volta — botão de interface, defeito industrial ou pedestre. E prevê as coordenadas em paralelo, não letra por letra.

23.06.2026

Blog LLMs & Texto

A Gated Graph Neural Network Approach to Fast-Convergent Dynamic Average Estimation

arXiv:2606.20955v1 Announce Type: new Abstract: Dynamic average estimation is a critical problem in multi-agent systems, enabling agents to collaboratively estimate time-varying signals using only local information exchange. Traditional model-based approaches often face challenges related to convergence speed and sensitivity to network topology changes. This paper introduces a novel learning-based solution leveraging Gated Graph Neural Networks (GGNNs) for fast-convergent dynamic average estimat...

23.06.2026

Blog Robótica & RL

Evolutionary Discovery of Developmental Reward Schedules in Deep Reinforcement Learning

arXiv:2606.20858v1 Announce Type: new Abstract: The temporal structure of reward composition in reinforcement learning (RL) is typically hand-designed and held fixed throughout training, leaving the progression of motivational priorities largely unexplored. In this work, we propose an evolutionary framework for discovering developmental reward schedules, in which three distinct biologically inspired motivational components -- agency, novelty, and reactivity -- are combined through time-varying w...

23.06.2026

Blog LLMs & Texto

PEAR: Permutation-Equivariant Adaptive Routing Multi-Agent Debate

arXiv:2606.20621v1 Announce Type: new Abstract: Multi-agent debate improves the reliability of large language models (LLMs) through iterative peer critiques. However, fixed topologies often introduce persistent positional biases, amplify unreliable agents, and cause high sensitivity to role assignments. We introduce \textit{Permutation-Equivariant Adaptive Routing Multi-Agent Debate (PEAR)}, an inference-time protocol that dynamically reconfigures communication roles and sparse topologies across...

23.06.2026

Blog Dados & Embeddings

CDER-SME: A Cross-Device Event-RGB Micro-Expression Dataset under Multi-Level Stress Induction

arXiv:2606.20715v1 Announce Type: new Abstract: Micro-expression recognition (MER) in realistic scenarios demands high temporal sensitivity and ecological validity, yet existing benchmarks are largely constrained to laboratory-controlled settings and rigid hardware-coupled sensing. We introduce CDER-SME, a cross-device Event-RGB dataset collected under a multi-level stress induction framework (cognitive and social) to elicit spontaneous emotional leakage. To enable reproducible acquisition with ...

23.06.2026

Blog Visão Computacional

VTOS: Learning to Orchestrate Vision Tools by Co-Searching Solutions and Observers

arXiv:2606.20728v1 Announce Type: new Abstract: Vision foundation tools such as open-vocabulary detectors, segmentation models, and post-processing operators are powerful building blocks for computer vision, but their effectiveness depends heavily on how they are orchestrated: which tools are used, in what order, with what parameters, and under what visual conditions. Existing visual-programming agents typically generate a fixed solution pipeline, making them brittle under dense objects, occlusi...

23.06.2026

Blog Visão Computacional

Mirage: a Clean-Label Backdoor against LiDAR 3D Object Detection

arXiv:2606.20752v1 Announce Type: new Abstract: Deep neural network-based LiDAR 3D object detection serves as a critical perception component in safety-critical autonomous systems. However, recent studies have revealed its vulnerability to backdoor attacks. Existing attacks typically require white-box access or label modification and focus on geometric attacks such as object disappearance or bounding-box manipulation. In this paper, we present Mirage, a black-box and clean-label backdoor attack ...

23.06.2026

Blog Visão Computacional

Membrane-based Acoustic Microrobots

arXiv:2606.21047v1 Announce Type: new Abstract: Acoustic microrobots have emerged as a promising frontier for targeted drug delivery and minimally invasive medicine due to their high-power density and biocompatibility. Despite wide-ranging designs, conventional acoustic microrobots mostly rely on air microbubbles trapped within confined microcavities within the robot body, which suffer from limited operational longevity due to rapid gas dissolution and resultant shifts in resonance frequency. In...

23.06.2026

Blog Visão Computacional

Amazon is testing Alexa+ in India with Hindi support

Amazon is planning to increase the footprint of its new conversational AI assistant Alexa+ to India and is inviting users in the country to test out a Hindi-language version.

22.06.2026

Blog Visão Computacional

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

22.06.2026

Editorial Visão Computacional

PerceptionDLM: modelos de difusão aprendem a descrever várias regiões de uma imagem ao mesmo tempo

Pesquisadores da Universidade de Pequim combinam um modelo de difusão de linguagem com um encoder de visão para gerar descrições de múltiplas regiões de uma imagem em paralelo — 3,4 vezes mais rápido que os métodos sequenciais.

22.06.2026

Modelo Multimodal

baidu/Unlimited-OCR

Modelo de visão e linguagem em alta no Hugging Face — 47 downloads e 133 curtidas da comunidade.

22.06.2026 ·↓ 47