LLMs & Texto — Radar de IA

Vesta: A Generalist Embodied Reasoning Model

arXiv:2606.20905v1 Announce Type: new Abstract: Robots operating in open-world environments must seamlessly integrate localization, spatial reasoning, navigation, and long-horizon planning. While specialist models excel at individual tasks, deploying a multi-model stack is computationally expensive and prone to cascading errors. We present Vesta, a unified embodied generalist that consolidates these capabilities into a single foundation model. Our approach combines a diverse and massive curated ...

23.06.2026

Blog LLMs & Texto

Video2Code: Generating Interactive Webpages from UI Videos via Action-Aware Revisit

arXiv:2606.20711v1 Announce Type: new Abstract: UI videos provide a natural input for generating interactive webpages, as they capture both webpage appearance and action-triggered state transitions. However, directly applying video-capable vision-language models to this task remains insufficient. Existing models typically rely on sparse sampling or compressed temporal representations, which may miss short action boundaries and break the state-action-state transitions needed to implement webpage ...

23.06.2026

Blog LLMs & Texto

On the Identifiability of User Adaptation in Co-Adaptive Neural Interfaces

arXiv:2606.20569v1 Announce Type: new Abstract: We analyze identifiability in co-adaptive human-machine systems. We show that closed-loop encoder estimates do not uniquely identify user adaptation, but instead reflect properties of the joint system. We discuss implications for interpreting behavioral adaptation and propose conditions for identification.

23.06.2026

Blog Robótica & RL

Massive Activations Are Architecturally Robust: A Controlled Scratch/Commitment Residual Stream Test

arXiv:2606.20743v1 Announce Type: new Abstract: Trained transformers reliably develop massive activations, a small number of hidden dimensions whose magnitude is far above the median and which concentrate on the sequence-start token. Whether these outliers are a removable artifact of the residual stream's overloaded read and write role, or instead a functional necessity, is actively debated. We test the artifact hypothesis directly, with an architectural intervention. Our architecture, Ledger Re...

23.06.2026

Blog LLMs & Texto

MAGNIFIED: RL Fine-tuning of Multimodal Large Language Models for Motion Planning

arXiv:2606.20641v1 Announce Type: new Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in semantic understanding and common sense reasoning, making them promising candidates for solving planning problems in autonomous driving. However, the next-token text prediction objectives traditionally used in pre-training and supervised fine-tuning (SFT) of MLLMs may fall short of fulfilling the planning objectives for autonomous vehicles. The next-token predict...

23.06.2026

Blog LLMs & Texto

Democratizing and accelerating AI-driven pathology research through agentic intelligence

arXiv:2606.20677v1 Announce Type: new Abstract: Computational pathology has advanced rapidly with the emergence of foundation models, yet widespread adoption remains limited by substantial technical complexity and programming requirements. Here we present PathLab, an autonomous agentic framework that translates natural-language research objectives into executable and validated computational pathology workflows through the structured composition of domain-specific skills and tools. By organizing ...

23.06.2026

Blog Visão Computacional

VTOS: Learning to Orchestrate Vision Tools by Co-Searching Solutions and Observers

arXiv:2606.20728v1 Announce Type: new Abstract: Vision foundation tools such as open-vocabulary detectors, segmentation models, and post-processing operators are powerful building blocks for computer vision, but their effectiveness depends heavily on how they are orchestrated: which tools are used, in what order, with what parameters, and under what visual conditions. Existing visual-programming agents typically generate a fixed solution pipeline, making them brittle under dense objects, occlusi...

23.06.2026

Blog LLMs & Texto

Towards CSI-Native Foundation Models: A Channel-Adaptive Roadmap for 6G

arXiv:2606.20670v1 Announce Type: new Abstract: Wireless foundation models offer a path toward reusable channel state information (CSI) intelligence for sixth-generation (6G) systems. However, existing generic-backbone adaptation and CSI pretraining methods often treat CSI as task tensors rather than propagation-conditioned channel responses, thereby failing to capture the intrinsic time-frequency-spatial geometry of wireless environments. This paper presents a channel-adaptive roadmap toward CS...

23.06.2026

Blog LLMs & Texto

Beyond ROC-AUC: Operating-Point Performance Reporting for Biometric Verification

arXiv:2606.20680v1 Announce Type: new Abstract: A biometric verifier is often deployed with a strict false match budget, so only a narrow, low false match rate (FMR) slice of the score range is used. A reporting standard for this setting already exists. ISO/IEC 19795-1 asks for error rates at stated operating points, for the detection error tradeoff (DET) curve as the view of the trade-off between FMR and the false non-match rate (FNMR), and for an interval of uncertainty on every value. In prac...

23.06.2026

Blog LLMs & Texto

Demystifying Numerical Instability in LLM Inference: Achieving Reproducible Inference for Mission-Critical Tasks with HEAL

arXiv:2606.21023v1 Announce Type: new Abstract: As Large Language Models (LLMs) deploy into mission-critical domains (e.g., finance, medicine, and law), output reproducibility has become a strict system requirement. While practitioners use greedy decoding to eliminate algorithmic stochasticity, empirical deployments with 16-bit precisions still exhibit catastrophic output divergence across heterogeneous GPUs. Through SASS-level profiling, we reveal that this inconsistency is fundamentally driven...

23.06.2026

Blog LLMs & Texto

TeleStyle V2: Beyond Content-Preserving Style Transfer with Self-Distillation and Distribution-Matching-Distillation

arXiv:2606.20709v1 Announce Type: new Abstract: Given a content reference and a style reference, content-preserving style transfer requires the model to generate stylized outputs with content and style consistency. We introduced TeleStyle V1 to tackle this problem. However, TeleStyle V1 is trained with photorealistic content reference and artistic style reference, which makes it incapable to cope with artistic content reference and realistic style reference in most cases. In this paper, we desig...

23.06.2026

Blog LLMs & Texto

Beyond Templates: Revisiting Zero-Shot Remote Sensing through Meta-Prompting

arXiv:2606.20702v1 Announce Type: new Abstract: Vision-language models (VLMs) have sparked growing interest in zero-shot Earth Observation (EO) downstream tasks, with further gains enabled by remote-sensing-adapted models. We examine this setting across 17 VLM variants and 12 remote sensing (RS) datasets under Meta-Prompting for Visual Recognition (MPVR), and show that zero-shot performance remains highly sensitive to textual design choices, from the meta-prompts used to guide the LLM in generat...

23.06.2026