// radar de ia

Dados & Embeddings

Papers, modelos e datasets em alta no Hugging Face, além do blog oficial — com leitura editorial em português.

todas LLMs & Texto Geração de Imagem Visão Computacional Áudio & Voz Multimodal Dados & Embeddings Robótica & RL

nvidia/Open-SWE-Traces

Dataset em destaque no Hugging Face — 1.1 mil downloads. Open-SWE-Traces: Advancing Distillation for Software Engineering Agents Data Overview Open-SWE-Traces is an agentic instruction tuning dataset designe…

18.06.2026 ·↓ 1061

Blog LLMs & Texto

Is it agentic enough? Benchmarking open models on your own tooling

18.06.2026

Paper LLMs & Texto

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Multi-LCB addresses the limitation of LiveCodeBench by providing a multi-language benchmark for evaluating LLMs across twelve programming languages while maintaining contamination…

18.06.2026 ·▲ 56

Paper Dados & Embeddings

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

FreeStyle is a scalable dual-reference generation framework that uses community LoRA mining to create large-scale style-content triplets while addressing content leakage through di…

18.06.2026 ·▲ 26

Paper LLMs & Texto

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LEDGERAGENT is a method for customer service agents that maintains task states in a separate ledger to improve policy adherence and state management during tool calling.

18.06.2026 ·▲ 6

Paper Dados & Embeddings

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 s…

18.06.2026 ·▲ 31

Paper LLMs & Texto

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Adaptive Binning introduces a training-adaptive discretization method for self-supervised learning on medical tabular data, improving representation learning through feature-wise r…

18.06.2026 ·▲ 1

Blog Dados & Embeddings

Introducing LifeSciBench

Introducing LifeSciBench, an expert-authored, expert-reviewed benchmark for evaluating how AI systems handle real-world life science research tasks and decisions.

17.06.2026

Paper LLMs & Texto

Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark

PhySciBench benchmark reveals limited performance of current LLM agents in physical science research, leading to development of DelveAgent framework that improves accuracy through…

17.06.2026 ·▲ 10

Paper LLMs & Texto

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

Current memory agents lack reliable shared institutional deployment due to challenges in balancing utility, access control, and forgetting across multiple principals with diverse a…

17.06.2026 ·▲ 13

Paper LLMs & Texto

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

FAPO optimizes LLM pipelines by combining prompt editing with structural changes, demonstrating superior performance across multiple benchmarks and security tasks.

17.06.2026 ·▲ 10

Paper Dados & Embeddings

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

ACIE, an agentic RAG system deployed in a clinical setting, demonstrates high accuracy in extracting medical information from complex patient contexts, achieving 96.

17.06.2026 ·▲ 5

← anterior 7 / 11 próxima →

125 itens no radar