nvidia/Open-SWE-Traces
Dataset em destaque no Hugging Face — 1.1 mil downloads. Open-SWE-Traces: Advancing Distillation for Software Engineering Agents Data Overview Open-SWE-Traces is an agentic instruction tuning dataset designe…
Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.
Dataset em destaque no Hugging Face — 1.1 mil downloads. Open-SWE-Traces: Advancing Distillation for Software Engineering Agents Data Overview Open-SWE-Traces is an agentic instruction tuning dataset designe…
Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination…
FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through di…
LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling.
A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 s…
Adaptive Binning introduces a training-adaptive discretization method for self-supervised learning on medical tabular data, improving representation learning through feature-wise r…
Introducing LifeSciBench, an expert-authored, expert-reviewed benchmark for evaluating how AI systems handle real-world life science research tasks and decisions.
PhySciBench benchmark reveals limited performance of current LLM agents in physical science research, leading to development of DelveAgent framework that improves accuracy through…
Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse a…
FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks.
ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.