WithinUsAI/claude_mythos_distilled_25k
Dataset em destaque no Hugging Face — 2.6 mil downloads. Claude Mythos Distilled 25K A high-quality synthetic supervised fine-tuning (SFT) dataset designed to train and fine-tune any LLM to mirror the capabi…
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
Dataset em destaque no Hugging Face — 2.6 mil downloads. Claude Mythos Distilled 25K A high-quality synthetic supervised fine-tuning (SFT) dataset designed to train and fine-tune any LLM to mirror the capabi…
Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points about what the paper does—and does not—claim. The research aims to develop robust evaluation methods for long-horizon delegated and […] The post Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability appeared first on Microso...
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations. The post mimalloc: A new, high-performance, scalable memory...
Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and system health. The post GridSFM: A new, small foundation model for the electric grid appeared first on Microsoft Research .
Dataset em destaque no Hugging Face — 776 downloads.
Dataset em destaque no Hugging Face — 199.2 mil downloads. PHYSICAL AI AUTONOMOUS VEHICLES The PhysicalAI-Autonomous-Vehicles dataset provides one of the largest, geographically diverse collections of multi-se…
Generative AI
Algorithms & Theory
Dataset em destaque no Hugging Face — 66.8 mil downloads. Dataset Summary SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks.
Modelo de mask generation em alta no Hugging Face — 1.7 mi downloads e 2.3 mil curtidas da comunidade.
In recent years, LLM Multi-Agent systems have garnered widespread attention for their collaborative approach to solving complex problems. However, it's a common scenario for these systems to fail at a task despite a flurry of activity. The post Which Agent Causes Task Failures and When?Researchers from PSU and Duke explores automated failure attribution of LLM Multi-Agent Systems first appeared on Synced .