Infatoshi/kernelbench-hard-traces
Dataset em destaque no Hugging Face — 378 downloads. KernelBench-Hard agent traces Frontier coding agents writing optimized CUDA/Triton kernels (FP8 GEMM, paged attention, MoE, W4A16, .
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
Dataset em destaque no Hugging Face — 378 downloads. KernelBench-Hard agent traces Frontier coding agents writing optimized CUDA/Triton kernels (FP8 GEMM, paged attention, MoE, W4A16, .
A running look — in reverse chronological order — at the bigger tech companies that have announced significant layoffs this year with AI as a stated factor.
OpenAI is using AI to help the open source community better protect itself.
Research reveals that reasoning models' safety outcomes are predictable from early hidden representations, with deliberation appearing but not substantially influencing final respo…
A comprehensive multimodal misinformation detection framework is introduced that handles complex, multilingual content with multiple images and diverse verification approaches, ach…
FLUX3D addresses limitations in image-to-3D Gaussian Splatting generation by improving representation learning and cross-modal alignment through specialized architectures and atten…
FlowR2A addresses the tension in multimodal driving planning by combining dense reward supervision with dynamic proposal generation through a flow-matching decoder that learns rewa…
Jailbreak attacks expose vulnerabilities in aligned large language models, revealing that harmful intent is encoded in structured intermediate uncertainty dynamics rather than outp…
Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying s…
Long-term memory in LLM agents should be evaluated as an auditable post-interaction artifact by reconstructing structured user state from the agent's memory, as demonstrated by MEM…
EDV is a three-stage framework that uses multiple heterogeneous agents to collaboratively construct reliable experiences for LLM agents, preventing self-confirmatory errors through…