LLMs & Texto — Radar de IA

RIZZ: Routing Interactions to Near Zero-Interference Zones for Continual Adaptation of Black-Box Agents

arXiv:2606.20638v1 Announce Type: new Abstract: Large language models are increasingly deployed as long-lived agents that must adapt across users, tasks, domains, modalities, and feedback regimes without access to model weights. Existing black-box adaptation methods typically optimize a single prompt, maintain an undifferentiated memory, or rely on repeated rollout-heavy search. However, these designs struggle when streams of input are nonstationary, feedback is sparse, and failures from one tas...

23.06.2026

Blog LLMs & Texto

Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet

arXiv:2606.20993v1 Announce Type: new Abstract: Multilingual language models often exhibit performance disparities across languages that can arise as early as the tokenization stage. Widely-used subword tokenization approaches favor high-resource languages, and tokenizer-free methods still yield longer sequences for scripts with a higher bytes-per-character ratio. To address these shortcomings, we propose to use the International Phonetic Alphabet (IPA) as a language-agnostic input representatio...

23.06.2026

Blog LLMs & Texto

Investigating Linguistic Steering: An Analysis of Adjectival Effects Across Large Language Model Architectures

arXiv:2606.20572v1 Announce Type: new Abstract: Achieving reliable control of Large Language Models (LLMs) requires a precise, scalable understanding of how they interpret linguistic cues. We introduce a rigorous framework using Shapley values to quantify the steering effect of individual adjectives on model performance, moving beyond anecdotal heuristics to principled attribution. Applying this method to 100 adjectives across a diverse suite of models (including o3, gpt-4o-mini, phi-3, llama-3-...

23.06.2026

Blog LLMs & Texto

DrugBench: Evaluating AI Control Protocols for Medication Harm Mitigation

arXiv:2606.20663v1 Announce Type: new Abstract: Large Language Models have the potential to expand and improve the access to clinical information by enabling new ways of interacting with medical knowledge in natural language. However, their deployment in medical question-answering settings is safety-critical, since misaligned outputs can lead to severe patient harm. AI control is an emerging approach that introduces external safeguards to mitigate unsafe behaviours in misaligned systems and has ...

23.06.2026

Blog LLMs & Texto

Demographic Metadata as Construct-Irrelevant Noise in DistilBERT-Based Automated Essay Scoring

arXiv:2606.21066v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems are increasingly used to support teachers in managing grading workloads and to provide a supplementary rater in large-scale assessments. While human grading is frequently influenced by students' demographic characteristics, the efficacy of different strategies for integrating demographic metadata with textual input used to train AES models remains underexplored. This study investigates the impact of a specific ...

23.06.2026

Blog LLMs & Texto

Scaling Diverse Language Generation for 3D Visual Grounding

arXiv:2606.20946v1 Announce Type: new Abstract: Developing robust models for 3D visual grounding (3DVG), the localization of entities in a 3D scene described in natural language, is important for enabling agents to correspond spatial language with objects in the physical world. However, the lack of diverse descriptions at scale prevents models from generalizing beyond simple linguistic patterns. Recent such attempts lack diversity in the constraint types and language used to ground objects. Capt...

23.06.2026

Blog LLMs & Texto

Skill Coverage: A Test Adequacy Metric for Agent Skills

arXiv:2606.20659v1 Announce Type: new Abstract: Agent skills encode reusable procedural knowledge that guides large language model agents across tasks and execution contexts. Existing evaluations primarily assess skills through task level outcomes, yet task success alone does not reveal which parts of a skill have been exercised or which remain untested. We introduce skill coverage, a test adequacy metric that treats the skill artifact as the object under test. Our approach extracts observable s...

23.06.2026

Blog LLMs & Texto

REKEY: Metadata-Grounded Visual-Key Regeneration for Contamination-Resilient VQA Evaluation

arXiv:2606.20736v1 Announce Type: new Abstract: Static visual question answering (VQA) benchmarks age quickly: Once the items leak into training corpora, scores can reflect memorization rather than genuine visual ability, thus obscuring real progress. Rebuilding high-quality benchmarks such as V*Bench requires substantial human annotation, yet each static release can quickly become another leaked artifact. We propose ReKey, a live benchmark protocol that randomly regenerates the answer-bearing l...

23.06.2026

Blog LLMs & Texto

A Quantum-Assisted Agentic Distributed Artificial Intelligence Framework for Deadline-Bounded Orchestration of Hybrid Renewable Microgrids

arXiv:2606.20667v1 Announce Type: new Abstract: The real-time orchestration of microgrids that combine fluctuating renewable sources, dispatchable units, storage and curtailable consumers requires the repeated solution of combinatorial dispatch and coalition formation problems under hard control deadlines. In this paper, a quantum-assisted agentic distributed artificial intelligence (DAI) framework is proposed in which the dispatch problem of each control slot is formulated as a quadratic uncons...

23.06.2026

Blog LLMs & Texto

SPARC: A Multi-Agent System for Electrical Circuit Question Answering

arXiv:2606.20643v1 Announce Type: new Abstract: Electrical circuit diagram QA tasks require complex mathematical reasoning, which remains challenging for multimodal LLMs. We present SPARC, a multi-agent system that answers questions over circuit diagrams by grounding reasoning in executable physics-based simulations. SPARC uses LLM agents to synthesize, execute, and analyze simulation programs, improving accuracy and reliability by design. It achieves 83% accuracy, with up to a 58% absolute impr...

23.06.2026

Blog LLMs & Texto

Path-dependent program induction under resource constraints explains human sequence learning

arXiv:2606.20623v1 Announce Type: new Abstract: How do people build abstract, reusable knowledge from sequential experience under bounded cognitive resources? To answer this question, we integrate rate-distortion theory with recent advances in program induction to describe how prior knowledge shapes which future structures are cheap to encode and easy to discover. We formalize this in a hierarchical Adaptor Grammar (HAG) with distinct local (within-task) and global (across-task) libraries, gover...

23.06.2026

Blog LLMs & Texto

Agent Behavior Mining: Generative AI Agent Governance in Business Processes

arXiv:2606.20669v1 Announce Type: new Abstract: As organizations increasingly deploy generative AI agents to automate business processes, they face a governance dilemma: although these agents can increase operational flexibility, their non-deterministic nature challenges the control and standardization that Business Process Management seeks to enforce. This paper addresses this \emph{invisible autonomy risk} by introducing \emph{Agent Behavior Mining}, a governance capability that enables the ap...

23.06.2026