CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression
Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets.
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
Two-channel evaluation shows output compression reduces costs while input compression increases costs and degrades accuracy across models and datasets.
Implicit Visual Chain-of-Thought decomposes visual conditioning into structural and semantic cascades for improved structure-aware image generation with sketch supervision.
Language-based world models enable agentic environment simulation across multiple domains and enhance general agent performance through scalable simulation and improved downstream…
NatureBench presents a cross-disciplinary benchmark of 90 scientific tasks derived from Nature publications to assess AI coding agents' ability to achieve discovery rather than jus…
Text-to-image models fail to generate counterfactual scenes because they rely on tightly coupled visual-textual patterns rather than causal reasoning, demonstrating limited underst…
DREAM trains dense retrieval embeddings using autoregressive language model attention mechanisms to supervise document-query similarity without requiring labeled examples.
A novel online data mixing framework called Holistic Data Scheduler uses reinforcement learning with a multi-objective reward function to optimize large language model pre-training…
World Value Model combines world models with value estimation to provide accurate task progression assessment and improve robotic policy learning from mixed-quality data.
Researchers introduce NanoGen, a unified framework for training and evaluating diffusion transformers that demonstrates the need for comprehensive benchmarking beyond ImageNet clas…
US autoworkers union warns of robot automation as dark factory future looms.
The loop takes agentic AI a step further, by authorizing a swarm of agents to work continuously in the background, endlessly.