OpenBioRQ: Unsolved Biomedical Research Questions for Agents
A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing signi…
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
A new biomedical benchmark evaluates agentic models' ability to verify sources and avoid false citations by testing unsolved research questions with no answer keys, revealing signi…
Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermedia…
BioMatrix is a novel multimodal foundation model that integrates molecular sequences, structures, and natural language into a unified decoder-only architecture for diverse biologic…
Multi4D addresses the trade-off between motion consistency and visual fidelity in dynamic 3D Gaussian splatting through a multi-level competitive allocation framework that enables…
EBench is a comprehensive simulation benchmark for evaluating generalist mobile manipulation policies across diverse tasks and dimensions, revealing distinct capability profiles an…
Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a funda…
Modelo de síntese de voz — 0 downloads e 207 curtidas no Hugging Face.
For the last 30 years, stopping the flow of cybersecurity-related software has proven to be ineffective. It's unclear why it would work now with Anthropic’s cybersecurity model Mythos.
Modelo de geração de texto · 3 B de parâmetros — 67.8 mil downloads e 756 curtidas no Hugging Face.
Dataset em destaque no Hugging Face — 13.0 mil downloads. claude-fable-5 Agent Traces It's worth noting that our team was working with Glint-Research to collect as much fable data as possible.
Just as last week was ending, the US government forced Anthropic to pull its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5’s guardrails. Cybersecurity researchers have since signed an open letter calling the move dangerous, and Anthropic itself noted the same jailbreaks exist in other models. So is […]
Just as last week was ending, the US government forced Anthropic to pull its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5’s guardrails. Cybersecurity researchers have since signed an open letter calling the move dangerous, and Anthropic itself noted the same jailbreaks exist in other models. So is […]