Radar de IA — Notícias, Modelos e Papers

MemoryVAM: Integrating Memory into Video Action Model for Robot Manipulation

arXiv:2606.20679v1 Announce Type: new Abstract: Video-world-model policies learn action-relevant representations by predicting future observations. However, they condition on only a short observation window, which renders long-horizon manipulation non-Markovian when the correct action depends on earlier events that are no longer visible. We present MemoryVAM, an episodic memory mechanism for video-world-model policies. We employ a Recap-Cue (RC) module, in which a Perceiver-based Recap Compresso...

23.06.2026

Blog LLMs & Texto

A-Evolve-Training: Autonomous Post-Training of a 30B Model

arXiv:2606.20657v1 Announce Type: new Abstract: Post-training a frontier model is normally weeks of human work: proposing data and recipe changes, launching runs, reading evals, deciding what to keep. We report an autonomous system that runs this loop with no human in the loop, post-training a 30B Nemotron across four rounds over multiple weeks. The autonomously produced model reaches a held-out score of 0.86 against the top human submission's 0.87 on the public NVIDIA Nemotron-Reasoning Challen...

23.06.2026

Blog Multimodal

How Well Can Your Video Model Remember? Measuring Memory-Budget Trade-offs in Long Video Understanding

arXiv:2606.20726v1 Announce Type: new Abstract: We introduce a compact empirical model that quantifies how answer accuracy degrades as a function of frame budget B and temporal distance D in long video understanding -- analyzing performance when recalling content from D seconds in the past using a fraction B of total frames. Long-form models operate under strict budgets, yet no prior framework predicts how accuracy degrades as B shrinks and events recede. We fit a weighted least-squares model on...

23.06.2026

Blog LLMs & Texto

Shear-Free Viewport Magnification for 360-Degree via Spherical Mobius Boosts

arXiv:2606.20684v1 Announce Type: new Abstract: Viewport-adaptive 360-degree imaging seeks to allocate a fixed sampling budget to the region a viewer is likely to observe. Existing view-biased projections increase viewport resolution through non-conformal warps, which can introduce anisotropic stretching and shear. We formulate spherical Mobius boosts as exact conformal maps for fixed-budget viewport magnification. The continuous spherical warp has quasiconformal dilatation K = 1, reallocating s...

23.06.2026

Blog Robótica & RL

Evolutionary Discovery of Developmental Reward Schedules in Deep Reinforcement Learning

arXiv:2606.20858v1 Announce Type: new Abstract: The temporal structure of reward composition in reinforcement learning (RL) is typically hand-designed and held fixed throughout training, leaving the progression of motivational priorities largely unexplored. In this work, we propose an evolutionary framework for discovering developmental reward schedules, in which three distinct biologically inspired motivational components -- agency, novelty, and reactivity -- are combined through time-varying w...

23.06.2026

Blog Robótica & RL

MV-WAM: Manifold-Aware World Action Model with Value Augmentation

arXiv:2606.21088v1 Announce Type: new Abstract: Achieving robust and generalizable manipulation across diverse environments remains a fundamental challenge in embodied robotics. Recent world action models achieve strong in-domain performance, yet their gains do not extend proportionally to out-of-distribution scenarios. We attribute this to a structural mismatch between visual and action modalities, whose intrinsically heterogeneous manifolds cause joint optimization to disproportionately degrad...

23.06.2026

Blog LLMs & Texto

MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data

arXiv:2606.20696v1 Announce Type: new Abstract: Decoding inner speech from non-invasive brain signals remains a fundamental challenge due to the absence of overt linguistic output, limited training data, and large inter-subject variability. Existing brain-to-text approaches often rely on task-specific decoder fine-tuning, which restricts scalability and complicates adaptation to new participants. We propose MindAlign, a decoupled two-stage brain-to-language framework that enables open-ended text...

23.06.2026

Blog LLMs & Texto

ARGUSTRACK: A Multi-View Annotation System for Multi-Object Tracking

arXiv:2606.20687v1 Announce Type: new Abstract: Multi-Camera Multi-Target (MCMT) tracking has emerged as a critical capability for applications ranging from autonomous driving to animal behavior monitoring. While recent advances have yielded sophisticated tracking algorithms, the availability of annotated multi-view data remains a significant bottleneck. Existing annotation tools predominantly support single-camera workflows or rely on LiDAR sensors, making cross-view labeling tedious and imprac...

23.06.2026

Blog LLMs & Texto

A Viscosity Semigroup Framework for Stable Image Reconstruction

arXiv:2606.20620v1 Announce Type: new Abstract: Starting from the axiomatic formulation of scale-space theory, we develop a viscosity-solution framework for multiscale image representations arising from degenerate elliptic-parabolic partial differential equations. Rather than introducing a new semigroup theory, we work within the standard viscosity-solution setting, using comparison principles to obtain well-posedness, uniqueness, and contraction in the supremum norm. This perspective is used to...

23.06.2026

Blog LLMs & Texto

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

arXiv:2606.20650v1 Announce Type: new Abstract: Instruction-based controllable speech synthesis enables users to specify emotions through natural language. However, existing approaches often rely on coarse emotion labels and lack explicit modeling of fine-grained intensity. We propose EmoInstruct-TTS, a dual-path instruction-guided framework for emotional speech synthesis. We introduce Emotion2embed, a supervised semantic-acoustic emotion embedding covering 48 emotional states, including fine-gr...

23.06.2026

Blog LLMs & Texto

$\Omega$: Operator-based Mixture Ensemble for Generative Assimilation

arXiv:2606.20920v1 Announce Type: new Abstract: Characterizing non-Gaussian posterior distributions in partially observed high-dimensional nonlinear systems remains a fundamental challenge in data assimilation. Ensemble Kalman filters rely on Gaussian approximations that can be inaccurate for strongly non-Gaussian posteriors, whereas particle filters suffer from severe scalability limitations. Recent score-based generative approaches improve posterior characterization but typically require super...

23.06.2026

Blog Robótica & RL

Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination

arXiv:2606.20962v1 Announce Type: new Abstract: High-performing human-human teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multi-Agent Reinforcement Learning (MARL) has attempted to develop computational methods for synthesizing such joint coordination-communication strategies, but emulating hete...

23.06.2026

O que está acontecendo agora

MemoryVAM: Integrating Memory into Video Action Model for Robot Manipulation

A-Evolve-Training: Autonomous Post-Training of a 30B Model

How Well Can Your Video Model Remember? Measuring Memory-Budget Trade-offs in Long Video Understanding

Shear-Free Viewport Magnification for 360-Degree via Spherical Mobius Boosts

Evolutionary Discovery of Developmental Reward Schedules in Deep Reinforcement Learning

MV-WAM: Manifold-Aware World Action Model with Value Augmentation

MindAlign: Decoding Inner Speech from fMRI Signals via Multimodal Embedding Alignment under Limited Data

ARGUSTRACK: A Multi-View Annotation System for Multi-Object Tracking

A Viscosity Semigroup Framework for Stable Image Reconstruction

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

$\Omega$: Operator-based Mixture Ensemble for Generative Assimilation

Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination