Radar de IA — Notícias, Modelos e Papers

Evaluation of Medical Vision Language Models HuluMed and MedGemma, and general purpose chatbots Gemma 3, ChatGPT Plus, and Claude Pro on real previously unseen wound images

arXiv:2606.20723v1 Announce Type: new Abstract: Chronic wound assessment remains a clinically challenging task that requires accurate interpretation of wound morphology, tissue composition, vascular characteristics, and infection risk. Recent advances in Vision-Language Models (VLMs) have introduced the possibility of automated multimodal wound analysis through image understanding combined with clinical reasoning. This study evaluates the performance of several general-purpose and medically spec...

23.06.2026

Blog LLMs & Texto

Topic-to-Timestamp Alignment by Constrained Evidence Selection

arXiv:2606.20890v1 Announce Type: new Abstract: Meeting archives are difficult to search when users remember what was discussed but not when. We study topic-to-timestamp alignment: given a natural-language topic and a timestamped meeting transcript, the goal is to return the time at which the topic is discussed. A standard RAG setup can retrieve relevant transcript excerpts, but still asks the language model to generate a timestamp, which can produce unsupported or invalid timecodes. We therefor...

23.06.2026

Blog LLMs & Texto

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

arXiv:2606.20625v1 Announce Type: new Abstract: LLM agents are promising for alpha mining via combining financial priors, symbolic reasoning, executable factor generation, and feedback-driven refinement. Yet, they face a combinatorial search space, noisy non-stationary feedback, redundant discoveries, and overfitting risks from naively reusing past successes. To address these challenges, we propose AlphaMemo, a self-evolving alpha mining agent with Structured Search-Process Memory. Rather than m...

23.06.2026

Blog LLMs & Texto

Learning What Not to Forget: Long-Horizon Agent Memory from a Few Kilobytes of Learning

arXiv:2606.20954v1 Announce Type: new Abstract: Long-running language-model systems accumulate interaction history that outgrows the context window, so they must continually evict. When an eviction policy drops a load-bearing detail, for example an access token issued at login or a path the next call needs, the action fails. We present LRE (Learned Relevance Eviction), a few kilobytes, CPU-only, language-model-free scorer that learns which units of history are load-bearing and keeps them by verb...

23.06.2026

Blog Dados & Embeddings

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

arXiv:2606.20769v1 Announce Type: new Abstract: AI systems for peer review fail on three fronts: they train on Computer Science and Machine Learning venues alone, ignore the iterative dialogue that validates science, and evaluate on stylistic mimicry rather than real editorial judgment. We introduce FirstPass, a dataset and fine-tuned model that addresses all three. Curating 3,668 complete multi-round peer-review dialogues from Nature Communications across five scientific domains (biology, chemi...

23.06.2026

Blog LLMs & Texto

Robust Image-Driven Phenotyping of Ovarian Tumor Cells using Optimized Dynamic Features in Hyperbolic Channels

arXiv:2606.20703v1 Announce Type: new Abstract: Label-free, image-based cellular mechanophenotyping in microfluidic devices provides a high-throughput method for single-cell profiling. However, while complex microchannels (e.g., hyperbolic geometries) reveal transient deformation dynamics under continuous extensional stress, the resulting high-dimensional feature spaces are highly susceptible to hydrodynamic artifacts. Flow rate variations often distort discriminative boundaries, linking feature...

23.06.2026

Blog LLMs & Texto

From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility

arXiv:2606.20751v1 Announce Type: new Abstract: Advanced Air Mobility (AAM) is an emerging low-altitude air transportation system whose successful deployment depends not only on technological advancement but also on public acceptance. This acceptance will drive government support, regulations, noise standards, and willingness to fly, and in turn the overall commercial viability of AAM. Understanding public sentiment toward AAM is therefore essential for identifying its societal barriers and info...

23.06.2026

Blog LLMs & Texto

PEAR: Permutation-Equivariant Adaptive Routing Multi-Agent Debate

arXiv:2606.20621v1 Announce Type: new Abstract: Multi-agent debate improves the reliability of large language models (LLMs) through iterative peer critiques. However, fixed topologies often introduce persistent positional biases, amplify unreliable agents, and cause high sensitivity to role assignments. We introduce \textit{Permutation-Equivariant Adaptive Routing Multi-Agent Debate (PEAR)}, an inference-time protocol that dynamically reconfigures communication roles and sparse topologies across...

23.06.2026

Blog LLMs & Texto

A Multi-Agent Audit Framework for High-Stakes Reasoning: Evaluation and Interpretability in Clinical Mental Health Screening

arXiv:2606.21123v1 Announce Type: new Abstract: High-stakes reasoning tasks necessitate transparent and verifiable workflows, yet conventional single-model large language models (LLMs) often struggle with hallucination and low interpretability under zero-shot paradigms. To address this general AI challenge, we propose a Multi-Agent Audit Framework that simulates a collaborative, multi-step verification process. We empirically validate this architecture in the sensitive domain of clinical mental ...

23.06.2026

Blog LLMs & Texto

Right Knowledge, Wrong Answer: Test-Time Steering for Temporal Fact Conflicts in Open-Weight Language Models

arXiv:2606.20959v1 Announce Type: new Abstract: Large language models can store both outdated facts and newer superseding facts in their parameters, but standard prompting may still elicit the outdated answer. We formalize this problem as Parametric Temporal Conflict (PTC) and introduce Temporal Attractor Steering (TAS), a three-stage test-time intervention that detects likely conflicts, identifies a conflict-critical layer, and steers hidden states toward newer-fact representations without retr...

23.06.2026

Blog Geração de Imagem

MotionPyramid: Hierarchical Motion Representation and Residual Interfaces

arXiv:2606.20705v1 Announce Type: new Abstract: We ask whether the representational hierarchy seen in perception, from local primitives such as edges to higher level structures such as parts and objects, can be established for motion. In humanoid control, low level actions specify immediate motor commands, while meaningful behavior is organized over longer temporal scales, including contacts, gait fragments, balance recovery, reaching, and whole body skills. We introduce MotionPyramid, a hierarc...

23.06.2026

Blog Geração de Imagem

BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling

arXiv:2606.21014v1 Announce Type: new Abstract: Robots must generate trajectories that remain faithful to learned expert behavior while satisfying safety constraints and task-specific objectives specified only at inference time. We formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as a prior and an inference-time, cost-derived likelihood tilting it toward feasible, optimal tr...

23.06.2026

O que está acontecendo agora

Evaluation of Medical Vision Language Models HuluMed and MedGemma, and general purpose chatbots Gemma 3, ChatGPT Plus, and Claude Pro on real previously unseen wound images

Topic-to-Timestamp Alignment by Constrained Evidence Selection

AlphaMemo: Structured Search-Process Memory for Self-Evolving Alpha Mining Agents

Learning What Not to Forget: Long-Horizon Agent Memory from a Few Kilobytes of Learning

FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes

Robust Image-Driven Phenotyping of Ovarian Tumor Cells using Optimized Dynamic Features in Hyperbolic Channels

From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility

PEAR: Permutation-Equivariant Adaptive Routing Multi-Agent Debate

A Multi-Agent Audit Framework for High-Stakes Reasoning: Evaluation and Interpretability in Clinical Mental Health Screening

Right Knowledge, Wrong Answer: Test-Time Steering for Temporal Fact Conflicts in Open-Weight Language Models

MotionPyramid: Hierarchical Motion Representation and Residual Interfaces

BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling