Blog Dados & Embeddings

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run appeared first on The Decoder .

The Decoder ·Matthias Bastian · 26 de janeiro de 2026

Ver no Hugging Face

// relacionados

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Leia também

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

J.P. Morgan sees a pile of red flags in the AI market

Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics