Counsel: A Meta-Evaluation Dataset for Agentic Tasks
A large-scale dataset of human-metaevaluations of LLM critiques for agentic tasks is introduced to improve the calibration and reliability of automated evaluation methods.
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
A large-scale dataset of human-metaevaluations of LLM critiques for agentic tasks is introduced to improve the calibration and reliability of automated evaluation methods.
Calibrated verifier telemetry enhances LLM agents in knowledge-intensive question answering by providing confidence scores and grounding verification, reducing both over-retrieval…
Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model qu…
Agentic Data Tailoring paradigm uses learnable data processing to structure high-entropy multimodal streams, with DataClaw_0-9B model achieving robust alignment through SFT and GRP…
Large language models demonstrate varying effectiveness in software development tasks, successfully completing localized refactoring but showing limitations in integrating new game…
EvoEmbedding is a dynamic embedding model that generates adaptive representations by maintaining a continuously updated latent memory, enabling improved retrieval performance in lo…
Tiered Language Models (TLMs) provide a framework for releasing large language models with configurable capability levels through secret keys that modify computation graphs while m…
Modelo de modelo em alta no Hugging Face — 24.0 mil downloads e 125 curtidas da comunidade.
Startup Baseten is reportedly close to finalizing a $1.5 billion round at a $13 billion as the “inference gold rush" marches on.
OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.
Modelo de geração de texto em alta no Hugging Face — 41.8 mil downloads e 251 curtidas da comunidade.